[jira] [Commented] (HDFS-6974) MiniHDFScluster breaks if there is an out of date hadoop.lib on the lib path
[ https://issues.apache.org/jira/browse/HDFS-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564384#comment-14564384 ] Michael Schmeißer commented on HDFS-6974: - The existence of this ticket helped to track down the problem, but if it is feasible, a more meaningfull error message would help here. MiniHDFScluster breaks if there is an out of date hadoop.lib on the lib path - Key: HDFS-6974 URL: https://issues.apache.org/jira/browse/HDFS-6974 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Environment: Windows with a version of Hadoop (HDP2.1) installed somewhere via an MSI Reporter: Steve Loughran Priority: Minor SLIDER-377 shows the trace of a MiniHDFSCluster test failing on native library calls ... the root cause appears to be the 2.4.1 hadoop lib on the path doesn't have all the methods needed by branch-2. When this situation arises, MiniHDFS cluster fails to work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8328) Follow-on to update decode for DataNode striped blocks reconstruction
[ https://issues.apache.org/jira/browse/HDFS-8328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564392#comment-14564392 ] Yi Liu commented on HDFS-8328: -- Thanks Kai for comments. {quote} {{minRequiredSources}} looks like a little confusing, because from coder's point of view {quote} This is not from coder's point of view, it's unrelated to coder. sources means datanodes which contain correct striped block. {quote} How about renaming nullInputBuffers to nullCellBuffers {quote} Currently in DN, decode buffer size is not same as cell size. I have comment {{// striped block length is 0}}, maybe I can change it to {{//The buffers and indices for striped blocks whose length is 0}}, and change the name to {{ZeroStripeBuffers}} and {{ZeroStripeIndices}}. {quote} I guess the following utilities can be moved elsewhere and shared with client side. targetsStatus could have a better name. {quote} {{covertIndex4Decode}} can be shared, I will move it to {{StripedBlockUtil}}. {quote} I'm wondering if the following codes can be better organized, like all the codes can be split into two functions: newStrippedReader and newBlockReader. {quote} The {{newBlockReader}} is already a separate function. {quote} Is it easy to centralize all the input/output buffers allocation in a function, so in future it would be easier to enhance respecting the fact that Java coders like on-heap buffer, but native coders prefer direct buffer. {quote} Agree, we can have a function for allocating buffer. Follow-on to update decode for DataNode striped blocks reconstruction - Key: HDFS-8328 URL: https://issues.apache.org/jira/browse/HDFS-8328 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8328-HDFS-7285.001.patch Current the decode for DataNode striped blocks reconstruction is a workaround, we need to update it after the decode fix in HADOOP-11847. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads
[ https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8496: --- Status: Patch Available (was: Open) Run all hdfs unit tests without introducing new failure. Calling stopWriter() with FSDatasetImpl lock held may block other threads -- Key: HDFS-8496 URL: https://issues.apache.org/jira/browse/HDFS-8496 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By looking at the stack, we found the calling of stopWriter() with FSDatasetImpl lock blocked everything. Following is the heartbeat stack, as an example, to show how threads are blocked by FSDatasetImpl lock: {code} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x000770465dc0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) {code} The thread which held the FSDatasetImpl lock is just sleeping to wait another thread to exit in stopWriter(). The stack is: {code} java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) - locked 0x0007636953b8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026) - locked 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} In this case, we deployed quite a lot other workloads on the DN, the local file system and disk is quite busy. We guess this is why the stopWriter took quite a long time. Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl lock held. In HDFS-7999, the createTemporary() is changed to call stopWriter without FSDatasetImpl lock. We guess we should do so in the other three methods: recoverClose()/recoverAppend/recoverRbw(). I'll try to finish a patch for this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8490) Typo in trace enabled log in WebHDFS exception handler
[ https://issues.apache.org/jira/browse/HDFS-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564343#comment-14564343 ] Hadoop QA commented on HDFS-8490: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 37s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 10s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 18s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 15s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 162m 9s | Tests passed in hadoop-hdfs. | | | | 208m 7s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736041/HDFS-8490.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d725dd8 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11159/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11159/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11159/console | This message was automatically generated. Typo in trace enabled log in WebHDFS exception handler -- Key: HDFS-8490 URL: https://issues.apache.org/jira/browse/HDFS-8490 Project: Hadoop HDFS Issue Type: Improvement Components: webhdfs Reporter: Jakob Homan Assignee: Archana T Priority: Trivial Labels: newbie Attachments: HDFS-8490.patch /hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/web/webhdfs/ExceptionHandler.java: {code} static DefaultFullHttpResponse exceptionCaught(Throwable cause) { Exception e = cause instanceof Exception ? (Exception) cause : new Exception(cause); if (LOG.isTraceEnabled()) { LOG.trace(GOT EXCEPITION, e); }{code} EXCEPITION is a typo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks
[ https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564381#comment-14564381 ] Hadoop QA commented on HDFS-8489: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 30s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 10 new or modified test files. | | {color:green}+1{color} | javac | 7m 52s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 18s | The applied patch generated 2 new checkstyle issues (total was 687, now 685). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 21s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 22s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 163m 33s | Tests passed in hadoop-hdfs. | | | | 211m 38s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736043/HDFS-8489.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d725dd8 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11160/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11160/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11160/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11160/console | This message was automatically generated. Subclass BlockInfo to represent contiguous blocks - Key: HDFS-8489 URL: https://issues.apache.org/jira/browse/HDFS-8489 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch As second step of the cleanup, we should make {{BlockInfo}} an abstract class and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The patch should clearly separate where to use the abstract class versus the subclass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8450) Erasure Coding: Consolidate erasure coding zone related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8450: --- Attachment: HDFS-8450-HDFS-7285-03.patch Erasure Coding: Consolidate erasure coding zone related implementation into a single class -- Key: HDFS-8450 URL: https://issues.apache.org/jira/browse/HDFS-8450 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8450-HDFS-7285-00.patch, HDFS-8450-HDFS-7285-01.patch, HDFS-8450-HDFS-7285-02.patch, HDFS-8450-HDFS-7285-03.patch The idea is to follow the same pattern suggested by HDFS-7416. It is good to consolidate all the erasure coding zone related implementations of {{FSNamesystem}}. Here, proposing {{FSDirErasureCodingZoneOp}} class to have functions to perform related erasure coding zone operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8450) Erasure Coding: Consolidate erasure coding zone related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564425#comment-14564425 ] Rakesh R commented on HDFS-8450: Attached another patch addressing [~drankye] comments. Erasure Coding: Consolidate erasure coding zone related implementation into a single class -- Key: HDFS-8450 URL: https://issues.apache.org/jira/browse/HDFS-8450 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8450-HDFS-7285-00.patch, HDFS-8450-HDFS-7285-01.patch, HDFS-8450-HDFS-7285-02.patch, HDFS-8450-HDFS-7285-03.patch The idea is to follow the same pattern suggested by HDFS-7416. It is good to consolidate all the erasure coding zone related implementations of {{FSNamesystem}}. Here, proposing {{FSDirErasureCodingZoneOp}} class to have functions to perform related erasure coding zone operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8497) ErasureCodingWorker fails to do decode work
[ https://issues.apache.org/jira/browse/HDFS-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564334#comment-14564334 ] Yi Liu commented on HDFS-8497: -- Hi Bo, the decode workaround is removed in HDFS-8328. ErasureCodingWorker fails to do decode work --- Key: HDFS-8497 URL: https://issues.apache.org/jira/browse/HDFS-8497 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8497-HDFS-7285-01.patch When I run the unit test in HDFS-8449, it fails due to the decode error in ErasureCodingWorker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7609) startup used too much time to load edits
[ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564340#comment-14564340 ] Hadoop QA commented on HDFS-7609: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 51s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 14s | The applied patch generated 2 new checkstyle issues (total was 321, now 321). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 17s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 15s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 162m 52s | Tests passed in hadoop-hdfs. | | | | 209m 13s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736039/HDFS-7609-3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d725dd8 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11158/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11158/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11158/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11158/console | This message was automatically generated. startup used too much time to load edits Key: HDFS-7609 URL: https://issues.apache.org/jira/browse/HDFS-7609 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Carrey Zhan Assignee: Ming Ma Labels: BB2015-05-RFC Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, recovery_do_not_use_retrycache.patch One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8254) In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer
[ https://issues.apache.org/jira/browse/HDFS-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564433#comment-14564433 ] Walter Su commented on HDFS-8254: - This case passed. {code} @Test(timeout=12) public void testDatanodeFailure3() { final int length = NUM_DATA_BLOCKS*BLOCK_SIZE -1; ... {code} This case failed. {code} @Test(timeout=12) public void testDatanodeFailure3() { final int length = NUM_DATA_BLOCKS*BLOCK_SIZE; ... {code} Fix {code} private long getCurrentSumBytes() { long sum = 0; for (int i = 0; i numDataBlocks; i++) { + if(streamers.get(i).isFailed()){ +continue; + } System.out.println(streamers.get(i).getBytesCurBlock()); sum += streamers.get(i).getBytesCurBlock(); } return sum; } {code} cause {{BytesCurBlock}} of the failed streamer isn't 0. When last stripe is full. We call {{writeParityCells()}} twice. To [~zhz]: bq. It also looks like we could run into a race condition if 2 streamers enter locateFollowingBlock around the same time? I think it won't be an issue. Cause MultipleBlockingQueue.poll(..) has {{synchronized(queues)}} In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer --- Key: HDFS-8254 URL: https://issues.apache.org/jira/browse/HDFS-8254 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8254_20150526.patch, h8254_20150526b.patch StripedDataStreamer javadoc is shown below. {code} * The StripedDataStreamer class is used by {@link DFSStripedOutputStream}. * There are two kinds of StripedDataStreamer, leading streamer and ordinary * stream. Leading streamer requests a block group from NameNode, unwraps * it to located blocks and transfers each located block to its corresponding * ordinary streamer via a blocking queue. {code} Leading streamer is the streamer with index 0. When the datanode of the leading streamer fails, the other steamers cannot continue since no one will request a block group from NameNode anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8453) Erasure coding: properly handle start offset for internal blocks in a block group
[ https://issues.apache.org/jira/browse/HDFS-8453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8453: Summary: Erasure coding: properly handle start offset for internal blocks in a block group (was: Erasure coding: properly assign start offset for internal blocks in a block group) Erasure coding: properly handle start offset for internal blocks in a block group - Key: HDFS-8453 URL: https://issues.apache.org/jira/browse/HDFS-8453 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8453-HDFS-7285.00.patch {code} void actualGetFromOneDataNode(final DNAddrPair datanode, ... LocatedBlock block = getBlockAt(blockStartOffset); ... fetchBlockAt(block.getStartOffset()); {code} The {{blockStartOffset}} here is from inner block. For parity blocks, the offset will overlap with the next block group, and we may end up with fetching wrong block. So we have to assign a meaningful start offset for internal blocks in a block group, especially for parity blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8425) [umbrella] Performance tuning and bug fixing for system tests for EC feature
[ https://issues.apache.org/jira/browse/HDFS-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8425: Summary: [umbrella] Performance tuning and bug fixing for system tests for EC feature (was: [umbrella] Bug fixing for System tests for EC feature) [umbrella] Performance tuning and bug fixing for system tests for EC feature Key: HDFS-8425 URL: https://issues.apache.org/jira/browse/HDFS-8425 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: GAO Rui This jira is {{umbrella}} jira for bug fixing of System tests for EC feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8497) ErasureCodingWorker fails to do decode work
[ https://issues.apache.org/jira/browse/HDFS-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-8497: Attachment: HDFS-8497-HDFS-7285-01.patch The unit test is another test for recovery work of datanode, I think we can add it to branch before HDFS-8497 ErasureCodingWorker fails to do decode work --- Key: HDFS-8497 URL: https://issues.apache.org/jira/browse/HDFS-8497 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8497-HDFS-7285-01.patch When I run the unit test in HDFS-8449, it fails due to the decode error in ErasureCodingWorker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads
[ https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhouyingchao updated HDFS-8496: --- Attachment: HDFS-8496-001.patch Calling stopWriter() with FSDatasetImpl lock held may block other threads -- Key: HDFS-8496 URL: https://issues.apache.org/jira/browse/HDFS-8496 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8496-001.patch On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By looking at the stack, we found the calling of stopWriter() with FSDatasetImpl lock blocked everything. Following is the heartbeat stack, as an example, to show how threads are blocked by FSDatasetImpl lock: {code} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x000770465dc0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) {code} The thread which held the FSDatasetImpl lock is just sleeping to wait another thread to exit in stopWriter(). The stack is: {code} java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) - locked 0x0007636953b8 (a org.apache.hadoop.util.Daemon) at org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:183) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverCheck(FsDatasetImpl.java:982) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.recoverClose(FsDatasetImpl.java:1026) - locked 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:624) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} In this case, we deployed quite a lot other workloads on the DN, the local file system and disk is quite busy. We guess this is why the stopWriter took quite a long time. Any way, it is not quite reasonable to call stopWriter with the FSDatasetImpl lock held. In HDFS-7999, the createTemporary() is changed to call stopWriter without FSDatasetImpl lock. We guess we should do so in the other three methods: recoverClose()/recoverAppend/recoverRbw(). I'll try to finish a patch for this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8497) ErasureCodingWorker fails to do decode work
[ https://issues.apache.org/jira/browse/HDFS-8497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564252#comment-14564252 ] Li Bo commented on HDFS-8497: - error correct: before HDFS-8449 ErasureCodingWorker fails to do decode work --- Key: HDFS-8497 URL: https://issues.apache.org/jira/browse/HDFS-8497 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8497-HDFS-7285-01.patch When I run the unit test in HDFS-8449, it fails due to the decode error in ErasureCodingWorker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8481: Attachment: HDFS-8481-HDFS-7285.03.patch Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, HDFS-8481-HDFS-7285.03.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564260#comment-14564260 ] Zhe Zhang commented on HDFS-8481: - Thanks Kai and Walter for the comments. The new patch moves the decoder to the {{DFSStripedInputStream}} level. bq. Assume we has a 768mb file (128mb * 6) which exactly contains 1 block group. We lost one block so we have to decode until 768mb data has been read. This is a good point. But to address this issue we need some nontrivial logic to call {{decode()}} multiple times. I suggest we do this optimization as a follow-on under HDFS-8031. Per Walter's suggestion above, we can also think of a better way to abstract {{decodeAndFillBuffer}} in that follow-on JIRA (it will be easier when both client and DN codes are stabilized). Let me know if the new patch looks good to you in respect of removing the decoding workaround. Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564317#comment-14564317 ] Hadoop QA commented on HDFS-6440: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 20m 53s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 24 new or modified test files. | | {color:green}+1{color} | javac | 8m 8s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 53s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 3m 1s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 4m 2s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 43s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 5m 59s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 23m 25s | Tests passed in hadoop-common. | | {color:red}-1{color} | hdfs tests | 168m 33s | Tests failed in hadoop-hdfs. | | {color:red}-1{color} | hdfs tests | 0m 18s | Tests failed in bkjournal. | | | | 247m 4s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestEncryptedTransfer | | Timed out tests | org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache | | Failed build | bkjournal | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736032/hdfs-6440-trunk-v7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d725dd8 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11157/artifact/patchprocess/whitespace.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11157/artifact/patchprocess/testrun_hadoop-common.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11157/artifact/patchprocess/testrun_hadoop-hdfs.txt | | bkjournal test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11157/artifact/patchprocess/testrun_bkjournal.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11157/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11157/console | This message was automatically generated. Support more than 2 NameNodes - Key: HDFS-6440 URL: https://issues.apache.org/jira/browse/HDFS-6440 Project: Hadoop HDFS Issue Type: New Feature Components: auto-failover, ha, namenode Affects Versions: 2.4.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 3.0.0 Attachments: Multiple-Standby-NameNodes_V1.pdf, hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, hdfs-multiple-snn-trunk-v0.patch Most of the work is already done to support more than 2 NameNodes (one active, one standby). This would be the last bit to support running multiple _standby_ NameNodes; one of the standbys should be available for fail-over. Mostly, this is a matter of updating how we parse configurations, some complexity around managing the checkpointing, and updating a whole lot of tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8450) Erasure Coding: Consolidate erasure coding zone related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564597#comment-14564597 ] Hadoop QA commented on HDFS-8450: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 28s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 37s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 16s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 41s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 55s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 4m 3s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 57s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 105m 47s | Tests failed in hadoop-hdfs. | | | | 153m 6s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.TestDFSStripedInputStream | | Timed out tests | org.apache.hadoop.hdfs.TestInjectionForSimulatedStorage | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736087/HDFS-8450-HDFS-7285-03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / 1299357 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11163/artifact/patchprocess/patchReleaseAuditProblems.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11163/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11163/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11163/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11163/console | This message was automatically generated. Erasure Coding: Consolidate erasure coding zone related implementation into a single class -- Key: HDFS-8450 URL: https://issues.apache.org/jira/browse/HDFS-8450 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8450-HDFS-7285-00.patch, HDFS-8450-HDFS-7285-01.patch, HDFS-8450-HDFS-7285-02.patch, HDFS-8450-HDFS-7285-03.patch The idea is to follow the same pattern suggested by HDFS-7416. It is good to consolidate all the erasure coding zone related implementations of {{FSNamesystem}}. Here, proposing {{FSDirErasureCodingZoneOp}} class to have functions to perform related erasure coding zone operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564618#comment-14564618 ] kanaka kumar avvaru commented on HDFS-7240: --- Very interesting to follow [~jnp], we also have some requirements to support Trillion level small objects/files. We will be intrested to contibute for OZone development. Can you please invite me also to the webex meeting? For now I have few comments on the this Project Practically partitioning may be difficult to be controlled by Storage Layer alone, as distribution depends on key construction applications. So, bucket partitioner classes can be a input while creating a bucket so that applications can handle the partitions well. Object level Metadata would be required such as tags/labels which can be used by computing jobs as additional info(similar to xaatributes on file) What is the plan for leveldbjni content file persistence has any concept like WAL for reliability is planned? When how does the leveldbjni content will be replicated? As millions are buckets are expected, is Partitioning for buckets is also required based on volume name? Swift AWS S3 support supports Object versions and replace. Does OZone also plan for the same? Missing feature like multi part loading,heavy object/storage space splits etc,, also can be pooled in the coming phases( may be phase 2 or later) We can also add readable snap shots of a bucket in the features queue? (may be at later stage of project) As part of Transparency encryption, encryption zone at bucket level could be an expectation from applications. Object store in HDFS Key: HDFS-7240 URL: https://issues.apache.org/jira/browse/HDFS-7240 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: Ozone-architecture-v1.pdf This jira proposes to add object store capabilities into HDFS. As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8407) hdfsListDirectory must set errno to 0 on success
[ https://issues.apache.org/jira/browse/HDFS-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564632#comment-14564632 ] Hudson commented on HDFS-8407: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #212 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/212/]) HDFS-8407. libhdfs hdfsListDirectory must set errno to 0 on success (Masatake Iwasaki via Colin P. McCabe) (cmccabe: rev d2d95bfe886a7fdf9d58fd5c47ec7c0158393afb) * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/expect.h * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_libhdfs_ops.c hdfsListDirectory must set errno to 0 on success Key: HDFS-8407 URL: https://issues.apache.org/jira/browse/HDFS-8407 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Reporter: Juan Yu Assignee: Masatake Iwasaki Fix For: 2.8.0 Attachments: HDFS-8407.001.patch, HDFS-8407.002.patch, HDFS-8407.003.patch The documentation says it returns NULL on error, but it could also return NULL when the directory is empty. /** * hdfsListDirectory - Get list of files/directories for a given * directory-path. hdfsFreeFileInfo should be called to deallocate memory. * @param fs The configured filesystem handle. * @param path The path of the directory. * @param numEntries Set to the number of files/directories in path. * @return Returns a dynamically-allocated array of hdfsFileInfo * objects; NULL on error. */ {code} hdfsFileInfo *pathList = NULL; ... //Figure out the number of entries in that directory jPathListSize = (*env)-GetArrayLength(env, jPathList); if (jPathListSize == 0) { ret = 0; goto done; } ... if (ret) { hdfsFreeFileInfo(pathList, jPathListSize); errno = ret; return NULL; } *numEntries = jPathListSize; return pathList; {code} Either change the implementation to match the doc, or fix the doc to match the implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8429) Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564633#comment-14564633 ] Hudson commented on HDFS-8429: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #212 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/212/]) HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread. (zhouyingchao via cmccabe) (cmccabe: rev 246cefa089156a50bf086b8b1e4d4324d66dc58c) * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java * hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread - Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Fix For: 2.8.0 Attachments: HDFS-8429-001.patch, HDFS-8429-002.patch, HDFS-8429-003.patch In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8443) Document dfs.namenode.service.handler.count in hdfs-site.xml
[ https://issues.apache.org/jira/browse/HDFS-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564630#comment-14564630 ] Hudson commented on HDFS-8443: -- SUCCESS: Integrated in Hadoop-Yarn-trunk-Java8 #212 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/212/]) HDFS-8443. Document dfs.namenode.service.handler.count in hdfs-site.xml. Contributed by J.Andreina. (aajisaka: rev d725dd8af682f0877cf523744d9801174b727f4e) * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Document dfs.namenode.service.handler.count in hdfs-site.xml Key: HDFS-8443 URL: https://issues.apache.org/jira/browse/HDFS-8443 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-8443.1.patch, HDFS-8443.2.patch, HDFS-8443.3.patch When dfs.namenode.servicerpc-address is configured, NameNode launches an extra RPC server to handle requests from non-client nodes. dfs.namenode.service.handler.count specifies the number of threads for the server but the parameter is not documented anywhere. I found a mail for asking about the parameter. http://mail-archives.apache.org/mod_mbox/hadoop-user/201505.mbox/%3CE0D5A619-BDEA-44D2-81EB-C32B8464133D%40gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8256) -storagepolicies , -blockId ,-replicaDetails options are missed out in usage and from documentation
[ https://issues.apache.org/jira/browse/HDFS-8256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564646#comment-14564646 ] Vinayakumar B commented on HDFS-8256: - Seems patch not applying on latest trunk. Needs rebase. -storagepolicies , -blockId ,-replicaDetails options are missed out in usage and from documentation -- Key: HDFS-8256 URL: https://issues.apache.org/jira/browse/HDFS-8256 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Reporter: J.Andreina Assignee: J.Andreina Labels: BB2015-05-TBR Attachments: HDFS-8256.2.patch, HDFS-8256.3.patch, HDFS-8256_Trunk.1.patch -storagepolicies , -blockId ,-replicaDetails options are missed out in usage and from documentation. {noformat} Usage: hdfs fsck path [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks [-includeSnapshots] [-showprogress] {noformat} Found as part of HDFS-8108. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8443) Document dfs.namenode.service.handler.count in hdfs-site.xml
[ https://issues.apache.org/jira/browse/HDFS-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564653#comment-14564653 ] Hudson commented on HDFS-8443: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #942 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/942/]) HDFS-8443. Document dfs.namenode.service.handler.count in hdfs-site.xml. Contributed by J.Andreina. (aajisaka: rev d725dd8af682f0877cf523744d9801174b727f4e) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml Document dfs.namenode.service.handler.count in hdfs-site.xml Key: HDFS-8443 URL: https://issues.apache.org/jira/browse/HDFS-8443 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-8443.1.patch, HDFS-8443.2.patch, HDFS-8443.3.patch When dfs.namenode.servicerpc-address is configured, NameNode launches an extra RPC server to handle requests from non-client nodes. dfs.namenode.service.handler.count specifies the number of threads for the server but the parameter is not documented anywhere. I found a mail for asking about the parameter. http://mail-archives.apache.org/mod_mbox/hadoop-user/201505.mbox/%3CE0D5A619-BDEA-44D2-81EB-C32B8464133D%40gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8407) hdfsListDirectory must set errno to 0 on success
[ https://issues.apache.org/jira/browse/HDFS-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564655#comment-14564655 ] Hudson commented on HDFS-8407: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #942 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/942/]) HDFS-8407. libhdfs hdfsListDirectory must set errno to 0 on success (Masatake Iwasaki via Colin P. McCabe) (cmccabe: rev d2d95bfe886a7fdf9d58fd5c47ec7c0158393afb) * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/expect.h * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h hdfsListDirectory must set errno to 0 on success Key: HDFS-8407 URL: https://issues.apache.org/jira/browse/HDFS-8407 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Reporter: Juan Yu Assignee: Masatake Iwasaki Fix For: 2.8.0 Attachments: HDFS-8407.001.patch, HDFS-8407.002.patch, HDFS-8407.003.patch The documentation says it returns NULL on error, but it could also return NULL when the directory is empty. /** * hdfsListDirectory - Get list of files/directories for a given * directory-path. hdfsFreeFileInfo should be called to deallocate memory. * @param fs The configured filesystem handle. * @param path The path of the directory. * @param numEntries Set to the number of files/directories in path. * @return Returns a dynamically-allocated array of hdfsFileInfo * objects; NULL on error. */ {code} hdfsFileInfo *pathList = NULL; ... //Figure out the number of entries in that directory jPathListSize = (*env)-GetArrayLength(env, jPathList); if (jPathListSize == 0) { ret = 0; goto done; } ... if (ret) { hdfsFreeFileInfo(pathList, jPathListSize); errno = ret; return NULL; } *numEntries = jPathListSize; return pathList; {code} Either change the implementation to match the doc, or fix the doc to match the implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8429) Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564656#comment-14564656 ] Hudson commented on HDFS-8429: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #942 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/942/]) HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread. (zhouyingchao via cmccabe) (cmccabe: rev 246cefa089156a50bf086b8b1e4d4324d66dc58c) * hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread - Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Fix For: 2.8.0 Attachments: HDFS-8429-001.patch, HDFS-8429-002.patch, HDFS-8429-003.patch In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564474#comment-14564474 ] Hadoop QA commented on HDFS-8481: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 9s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 38s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 22s | The patch appears to introduce 2 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 16s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 172m 4s | Tests failed in hadoop-hdfs. | | | | 214m 6s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS | | | hadoop.hdfs.TestRecoverStripedFile | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.blockmanagement.TestBlockInfo | | | hadoop.hdfs.server.namenode.TestFileTruncate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736057/HDFS-8481-HDFS-7285.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / 1299357 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11161/artifact/patchprocess/patchReleaseAuditProblems.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11161/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11161/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11161/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11161/console | This message was automatically generated. Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, HDFS-8481-HDFS-7285.03.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8251) Move the synthetic load generator into its own package
[ https://issues.apache.org/jira/browse/HDFS-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564552#comment-14564552 ] Vinayakumar B commented on HDFS-8251: - bq. hadoop-test-tool seems very generic to me. It might make sense if there was more than the HDFS load generator in it. Leaving this in RFC for bug bash for a second opinion. IMO, its okay to keep it generic, as current tools depends on the entire hadoop( HDFS and MR) for execution, rather than only HDFS. In-fact this is the reason why these were kept in mapreduce project to resolve the dependencies. I agree they are not MR tools, but uses MR infrastructure. In future, if any such tools intended for some other components, which uses entire hadoop, also can be put in this project. Thoughts? Move the synthetic load generator into its own package -- Key: HDFS-8251 URL: https://issues.apache.org/jira/browse/HDFS-8251 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: J.Andreina Labels: BB2015-05-RFC Attachments: HDFS-8251.1.patch It doesn't really make sense for the HDFS load generator to be a part of the (extremely large) mapreduce jobclient package. It should be pulled out and put its own package, probably in hadoop-tools. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3716) Purger should remove stale fsimage ckpt files
[ https://issues.apache.org/jira/browse/HDFS-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564582#comment-14564582 ] Vinayakumar B commented on HDFS-3716: - change looks fine to me. I think you can add one test case for this. Purger should remove stale fsimage ckpt files - Key: HDFS-3716 URL: https://issues.apache.org/jira/browse/HDFS-3716 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha Reporter: suja s Assignee: J.Andreina Priority: Minor Attachments: HDFS-3716.1.patch NN got killed while checkpointing in progress before renaming the ckpt file to actual file. Since the checkpointing process is not completed, on next NN startup it will load previous fsimage and apply rest of the edits. Functionally there's no harm but this ckpt file will be retained as is. Purger will not remove the ckpt file though other old fsimage files will be taken care. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8270) create() always retried with hardcoded timeout when file already exists
[ https://issues.apache.org/jira/browse/HDFS-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564534#comment-14564534 ] Vinayakumar B commented on HDFS-8270: - Seems like default retries also got removed. Client is not retrying for even connect exceptions. Just following changes will do IMO in NameNodeProxies#createNNProxyWithClientProtocol(..) inside {{withRetries}} if block, do the below changes. Let everything else be same. {code} if (withRetries) { // create the proxy with retries - RetryPolicy createPolicy = RetryPolicies - .retryUpToMaximumCountWithFixedSleep(5, - HdfsServerConstants.LEASE_SOFTLIMIT_PERIOD, TimeUnit.MILLISECONDS); - - MapClass? extends Exception, RetryPolicy remoteExceptionToPolicyMap - = new HashMapClass? extends Exception, RetryPolicy(); - remoteExceptionToPolicyMap.put(AlreadyBeingCreatedException.class, - createPolicy); - - RetryPolicy methodPolicy = RetryPolicies.retryByRemoteException( - defaultPolicy, remoteExceptionToPolicyMap); MapString, RetryPolicy methodNameToPolicyMap = new HashMapString, RetryPolicy(); - - methodNameToPolicyMap.put(create, methodPolicy); ClientProtocol translatorProxy = new ClientNamenodeProtocolTranslatorPB(proxy); {code} create() always retried with hardcoded timeout when file already exists --- Key: HDFS-8270 URL: https://issues.apache.org/jira/browse/HDFS-8270 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Andrey Stepachev Assignee: J.Andreina Attachments: HDFS-8270.1.patch In Hbase we stumbled on unexpected behaviour, which could break things. HDFS-6478 fixed wrong exception translation, but that apparently led to unexpected bahaviour: clients trying to create file without override=true will be forced to retry hardcoded amount of time (60 seconds). That could break or slowdown systems, that use filesystem for locks (like hbase fsck did, and we got it broken HBASE-13574). We should make this behaviour configurable, do client really need to wait lease timeout to be sure that file doesn't exists, or it it should be enough to fail fast. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8496) Calling stopWriter() with FSDatasetImpl lock held may block other threads
[ https://issues.apache.org/jira/browse/HDFS-8496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564543#comment-14564543 ] Hadoop QA commented on HDFS-8496: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 34s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 26s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 14s | The applied patch generated 1 new checkstyle issues (total was 124, now 120). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 13s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 12s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 162m 57s | Tests passed in hadoop-hdfs. | | | | 208m 44s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736065/HDFS-8496-001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d725dd8 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11162/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11162/artifact/patchprocess/whitespace.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11162/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11162/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf906.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11162/console | This message was automatically generated. Calling stopWriter() with FSDatasetImpl lock held may block other threads -- Key: HDFS-8496 URL: https://issues.apache.org/jira/browse/HDFS-8496 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Attachments: HDFS-8496-001.patch On a DN of a HDFS 2.6 cluster, we noticed some DataXceiver threads and heartbeat threads are blocked for quite a while on the FSDatasetImpl lock. By looking at the stack, we found the calling of stopWriter() with FSDatasetImpl lock blocked everything. Following is the heartbeat stack, as an example, to show how threads are blocked by FSDatasetImpl lock: {code} java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:152) - waiting to lock 0x0007701badc0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getAvailable(FsVolumeImpl.java:191) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getStorageReports(FsDatasetImpl.java:144) - locked 0x000770465dc0 (a java.lang.Object) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:575) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:680) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:850) at java.lang.Thread.run(Thread.java:662) {code} The thread which held the FSDatasetImpl lock is just sleeping to wait another thread to exit in stopWriter(). The stack is: {code} java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1194) - locked 0x0007636953b8 (a org.apache.hadoop.util.Daemon)
[jira] [Commented] (HDFS-6775) Users may see TrashPolicy if hdfs dfs -rm is run
[ https://issues.apache.org/jira/browse/HDFS-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564594#comment-14564594 ] Vinayakumar B commented on HDFS-6775: - +1, LGTM. Committing soon Users may see TrashPolicy if hdfs dfs -rm is run Key: HDFS-6775 URL: https://issues.apache.org/jira/browse/HDFS-6775 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: J.Andreina Attachments: HDFS-6775.1.patch, HDFS-6775.2.patch Doing 'hdfs dfs -rm file' generates an extra log message on the console: {code} 14/07/29 15:18:56 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. {code} This shouldn't be seen by users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes
[ https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564571#comment-14564571 ] Vinayakumar B commented on HDFS-7401: - +1 for the patch. Will commit soon Add block info to DFSInputStream' WARN message when it adds node to deadNodes - Key: HDFS-7401 URL: https://issues.apache.org/jira/browse/HDFS-7401 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Arshad Mohammad Priority: Minor Labels: BB2015-05-RFC Attachments: HDFS-7401-2.patch, HDFS-7401.patch Block info is missing in the below message {noformat} 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. java.io.IOException: Got error for OP_READ_BLOCK {noformat} The code {noformat} DFSInputStream.java DFSClient.LOG.warn(Failed to connect to + targetAddr + for block + , add to deadNodes and continue. + ex, ex); {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8254) In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer
[ https://issues.apache.org/jira/browse/HDFS-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564572#comment-14564572 ] Walter Su commented on HDFS-8254: - This case failed. {code} @Test(timeout=12) public void testDatanodeFailure3() { final int length = NUM_DATA_BLOCKS*BLOCK_SIZE * 2; ... {code} cause: Thread streamer #3 has been shutdown because of {{handleBadDatanode()}}. When outputstream move forword to write next block group. streamer #3 has error and doesn't have endBlock in Coordinator. In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer --- Key: HDFS-8254 URL: https://issues.apache.org/jira/browse/HDFS-8254 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8254_20150526.patch, h8254_20150526b.patch StripedDataStreamer javadoc is shown below. {code} * The StripedDataStreamer class is used by {@link DFSStripedOutputStream}. * There are two kinds of StripedDataStreamer, leading streamer and ordinary * stream. Leading streamer requests a block group from NameNode, unwraps * it to located blocks and transfers each located block to its corresponding * ordinary streamer via a blocking queue. {code} Leading streamer is the streamer with index 0. When the datanode of the leading streamer fails, the other steamers cannot continue since no one will request a block group from NameNode anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes
[ https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7401: Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thanks all. Add block info to DFSInputStream' WARN message when it adds node to deadNodes - Key: HDFS-7401 URL: https://issues.apache.org/jira/browse/HDFS-7401 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Arshad Mohammad Priority: Minor Labels: BB2015-05-RFC Fix For: 2.8.0 Attachments: HDFS-7401-2.patch, HDFS-7401.patch Block info is missing in the below message {noformat} 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. java.io.IOException: Got error for OP_READ_BLOCK {noformat} The code {noformat} DFSInputStream.java DFSClient.LOG.warn(Failed to connect to + targetAddr + for block + , add to deadNodes and continue. + ex, ex); {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes
[ https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564590#comment-14564590 ] Hudson commented on HDFS-7401: -- FAILURE: Integrated in Hadoop-trunk-Commit #7924 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7924/]) HDFS-7401. Add block info to DFSInputStream' WARN message when it adds node to deadNodes (Contributed by Arshad Mohammad) (vinayakumarb: rev b75df697e0f101f86788ad23a338ab3545b8d702) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java Add block info to DFSInputStream' WARN message when it adds node to deadNodes - Key: HDFS-7401 URL: https://issues.apache.org/jira/browse/HDFS-7401 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Arshad Mohammad Priority: Minor Labels: BB2015-05-RFC Fix For: 2.8.0 Attachments: HDFS-7401-2.patch, HDFS-7401.patch Block info is missing in the below message {noformat} 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. java.io.IOException: Got error for OP_READ_BLOCK {noformat} The code {noformat} DFSInputStream.java DFSClient.LOG.warn(Failed to connect to + targetAddr + for block + , add to deadNodes and continue. + ex, ex); {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8443) Document dfs.namenode.service.handler.count in hdfs-site.xml
[ https://issues.apache.org/jira/browse/HDFS-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564753#comment-14564753 ] Hudson commented on HDFS-8443: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2140 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2140/]) HDFS-8443. Document dfs.namenode.service.handler.count in hdfs-site.xml. Contributed by J.Andreina. (aajisaka: rev d725dd8af682f0877cf523744d9801174b727f4e) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml Document dfs.namenode.service.handler.count in hdfs-site.xml Key: HDFS-8443 URL: https://issues.apache.org/jira/browse/HDFS-8443 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-8443.1.patch, HDFS-8443.2.patch, HDFS-8443.3.patch When dfs.namenode.servicerpc-address is configured, NameNode launches an extra RPC server to handle requests from non-client nodes. dfs.namenode.service.handler.count specifies the number of threads for the server but the parameter is not documented anywhere. I found a mail for asking about the parameter. http://mail-archives.apache.org/mod_mbox/hadoop-user/201505.mbox/%3CE0D5A619-BDEA-44D2-81EB-C32B8464133D%40gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes
[ https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564749#comment-14564749 ] Hudson commented on HDFS-7401: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2140 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2140/]) HDFS-7401. Add block info to DFSInputStream' WARN message when it adds node to deadNodes (Contributed by Arshad Mohammad) (vinayakumarb: rev b75df697e0f101f86788ad23a338ab3545b8d702) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java Add block info to DFSInputStream' WARN message when it adds node to deadNodes - Key: HDFS-7401 URL: https://issues.apache.org/jira/browse/HDFS-7401 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Arshad Mohammad Priority: Minor Labels: BB2015-05-RFC Fix For: 2.8.0 Attachments: HDFS-7401-2.patch, HDFS-7401.patch Block info is missing in the below message {noformat} 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. java.io.IOException: Got error for OP_READ_BLOCK {noformat} The code {noformat} DFSInputStream.java DFSClient.LOG.warn(Failed to connect to + targetAddr + for block + , add to deadNodes and continue. + ex, ex); {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8429) Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564757#comment-14564757 ] Hudson commented on HDFS-8429: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2140 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2140/]) HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread. (zhouyingchao via cmccabe) (cmccabe: rev 246cefa089156a50bf086b8b1e4d4324d66dc58c) * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java * hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread - Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Fix For: 2.8.0 Attachments: HDFS-8429-001.patch, HDFS-8429-002.patch, HDFS-8429-003.patch In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8407) hdfsListDirectory must set errno to 0 on success
[ https://issues.apache.org/jira/browse/HDFS-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564756#comment-14564756 ] Hudson commented on HDFS-8407: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2140 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2140/]) HDFS-8407. libhdfs hdfsListDirectory must set errno to 0 on success (Masatake Iwasaki via Colin P. McCabe) (cmccabe: rev d2d95bfe886a7fdf9d58fd5c47ec7c0158393afb) * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/expect.h hdfsListDirectory must set errno to 0 on success Key: HDFS-8407 URL: https://issues.apache.org/jira/browse/HDFS-8407 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Reporter: Juan Yu Assignee: Masatake Iwasaki Fix For: 2.8.0 Attachments: HDFS-8407.001.patch, HDFS-8407.002.patch, HDFS-8407.003.patch The documentation says it returns NULL on error, but it could also return NULL when the directory is empty. /** * hdfsListDirectory - Get list of files/directories for a given * directory-path. hdfsFreeFileInfo should be called to deallocate memory. * @param fs The configured filesystem handle. * @param path The path of the directory. * @param numEntries Set to the number of files/directories in path. * @return Returns a dynamically-allocated array of hdfsFileInfo * objects; NULL on error. */ {code} hdfsFileInfo *pathList = NULL; ... //Figure out the number of entries in that directory jPathListSize = (*env)-GetArrayLength(env, jPathList); if (jPathListSize == 0) { ret = 0; goto done; } ... if (ret) { hdfsFreeFileInfo(pathList, jPathListSize); errno = ret; return NULL; } *numEntries = jPathListSize; return pathList; {code} Either change the implementation to match the doc, or fix the doc to match the implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8443) Document dfs.namenode.service.handler.count in hdfs-site.xml
[ https://issues.apache.org/jira/browse/HDFS-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564909#comment-14564909 ] Hudson commented on HDFS-8443: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/201/]) HDFS-8443. Document dfs.namenode.service.handler.count in hdfs-site.xml. Contributed by J.Andreina. (aajisaka: rev d725dd8af682f0877cf523744d9801174b727f4e) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml Document dfs.namenode.service.handler.count in hdfs-site.xml Key: HDFS-8443 URL: https://issues.apache.org/jira/browse/HDFS-8443 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-8443.1.patch, HDFS-8443.2.patch, HDFS-8443.3.patch When dfs.namenode.servicerpc-address is configured, NameNode launches an extra RPC server to handle requests from non-client nodes. dfs.namenode.service.handler.count specifies the number of threads for the server but the parameter is not documented anywhere. I found a mail for asking about the parameter. http://mail-archives.apache.org/mod_mbox/hadoop-user/201505.mbox/%3CE0D5A619-BDEA-44D2-81EB-C32B8464133D%40gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8471) Implement read block over HTTP/2
[ https://issues.apache.org/jira/browse/HDFS-8471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HDFS-8471: Attachment: HDFS-8471.1.patch Add checksum support. Introduce a ReadBlockHandler. Add a testcase to test block not exists error. Implement read block over HTTP/2 Key: HDFS-8471 URL: https://issues.apache.org/jira/browse/HDFS-8471 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Duo Zhang Assignee: Duo Zhang Attachments: HDFS-8471.1.patch, HDFS-8471.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8407) hdfsListDirectory must set errno to 0 on success
[ https://issues.apache.org/jira/browse/HDFS-8407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564956#comment-14564956 ] Hudson commented on HDFS-8407: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #210 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/210/]) HDFS-8407. libhdfs hdfsListDirectory must set errno to 0 on success (Masatake Iwasaki via Colin P. McCabe) (cmccabe: rev d2d95bfe886a7fdf9d58fd5c47ec7c0158393afb) * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_libhdfs_ops.c * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.c * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test_libhdfs_threaded.c * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/expect.h * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/hdfs.h * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hdfsListDirectory must set errno to 0 on success Key: HDFS-8407 URL: https://issues.apache.org/jira/browse/HDFS-8407 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Reporter: Juan Yu Assignee: Masatake Iwasaki Fix For: 2.8.0 Attachments: HDFS-8407.001.patch, HDFS-8407.002.patch, HDFS-8407.003.patch The documentation says it returns NULL on error, but it could also return NULL when the directory is empty. /** * hdfsListDirectory - Get list of files/directories for a given * directory-path. hdfsFreeFileInfo should be called to deallocate memory. * @param fs The configured filesystem handle. * @param path The path of the directory. * @param numEntries Set to the number of files/directories in path. * @return Returns a dynamically-allocated array of hdfsFileInfo * objects; NULL on error. */ {code} hdfsFileInfo *pathList = NULL; ... //Figure out the number of entries in that directory jPathListSize = (*env)-GetArrayLength(env, jPathList); if (jPathListSize == 0) { ret = 0; goto done; } ... if (ret) { hdfsFreeFileInfo(pathList, jPathListSize); errno = ret; return NULL; } *numEntries = jPathListSize; return pathList; {code} Either change the implementation to match the doc, or fix the doc to match the implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8498) Blocks can be committed with wrong size
Daryn Sharp created HDFS-8498: - Summary: Blocks can be committed with wrong size Key: HDFS-8498 URL: https://issues.apache.org/jira/browse/HDFS-8498 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical When an IBR for a UC block arrives, the NN updates the expected location's block and replica state _only_ if it's on an unexpected storage for an expected DN. If it's for an expected storage, only the genstamp is updated. When the block is committed, and the expected locations are verified, only the genstamp is checked. The size is not checked but it wasn't updated in the expected locations anyway. A faulty client may misreport the size when committing the block. The block is effectively corrupted. If the NN issues replications, the received IBR is considered corrupt, the NN invalidates the block, immediately issues another replication. The NN eventually realizes all the original replicas are corrupt after full BRs are received from the original DNs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes
[ https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564908#comment-14564908 ] Hudson commented on HDFS-7401: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #201 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/201/]) HDFS-7401. Add block info to DFSInputStream' WARN message when it adds node to deadNodes (Contributed by Arshad Mohammad) (vinayakumarb: rev b75df697e0f101f86788ad23a338ab3545b8d702) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Add block info to DFSInputStream' WARN message when it adds node to deadNodes - Key: HDFS-7401 URL: https://issues.apache.org/jira/browse/HDFS-7401 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Arshad Mohammad Priority: Minor Labels: BB2015-05-RFC Fix For: 2.8.0 Attachments: HDFS-7401-2.patch, HDFS-7401.patch Block info is missing in the below message {noformat} 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. java.io.IOException: Got error for OP_READ_BLOCK {noformat} The code {noformat} DFSInputStream.java DFSClient.LOG.warn(Failed to connect to + targetAddr + for block + , add to deadNodes and continue. + ex, ex); {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8443) Document dfs.namenode.service.handler.count in hdfs-site.xml
[ https://issues.apache.org/jira/browse/HDFS-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564954#comment-14564954 ] Hudson commented on HDFS-8443: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #210 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/210/]) HDFS-8443. Document dfs.namenode.service.handler.count in hdfs-site.xml. Contributed by J.Andreina. (aajisaka: rev d725dd8af682f0877cf523744d9801174b727f4e) * hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Document dfs.namenode.service.handler.count in hdfs-site.xml Key: HDFS-8443 URL: https://issues.apache.org/jira/browse/HDFS-8443 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-8443.1.patch, HDFS-8443.2.patch, HDFS-8443.3.patch When dfs.namenode.servicerpc-address is configured, NameNode launches an extra RPC server to handle requests from non-client nodes. dfs.namenode.service.handler.count specifies the number of threads for the server but the parameter is not documented anywhere. I found a mail for asking about the parameter. http://mail-archives.apache.org/mod_mbox/hadoop-user/201505.mbox/%3CE0D5A619-BDEA-44D2-81EB-C32B8464133D%40gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7401) Add block info to DFSInputStream' WARN message when it adds node to deadNodes
[ https://issues.apache.org/jira/browse/HDFS-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564953#comment-14564953 ] Hudson commented on HDFS-7401: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #210 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/210/]) HDFS-7401. Add block info to DFSInputStream' WARN message when it adds node to deadNodes (Contributed by Arshad Mohammad) (vinayakumarb: rev b75df697e0f101f86788ad23a338ab3545b8d702) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java Add block info to DFSInputStream' WARN message when it adds node to deadNodes - Key: HDFS-7401 URL: https://issues.apache.org/jira/browse/HDFS-7401 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: Arshad Mohammad Priority: Minor Labels: BB2015-05-RFC Fix For: 2.8.0 Attachments: HDFS-7401-2.patch, HDFS-7401.patch Block info is missing in the below message {noformat} 2014-11-14 03:59:00,386 WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /xx.xx.xx.xxx:50010 for block, add to deadNodes and continue. java.io.IOException: Got error for OP_READ_BLOCK {noformat} The code {noformat} DFSInputStream.java DFSClient.LOG.warn(Failed to connect to + targetAddr + for block + , add to deadNodes and continue. + ex, ex); {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8429) Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread
[ https://issues.apache.org/jira/browse/HDFS-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564957#comment-14564957 ] Hudson commented on HDFS-8429: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #210 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/210/]) HDFS-8429. Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread. (zhouyingchao via cmccabe) (cmccabe: rev 246cefa089156a50bf086b8b1e4d4324d66dc58c) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java * hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/net/unix/DomainSocketWatcher.c Avoid stuck threads if there is an error in DomainSocketWatcher that stops the thread - Key: HDFS-8429 URL: https://issues.apache.org/jira/browse/HDFS-8429 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: zhouyingchao Assignee: zhouyingchao Fix For: 2.8.0 Attachments: HDFS-8429-001.patch, HDFS-8429-002.patch, HDFS-8429-003.patch In our cluster, an application is hung when doing a short circuit read of local hdfs block. By looking into the log, we found the DataNode's DomainSocketWatcher.watcherThread has exited with following log: {code} ERROR org.apache.hadoop.net.unix.DomainSocketWatcher: Thread[Thread-25,5,main] terminating on unexpected exception java.lang.NullPointerException at org.apache.hadoop.net.unix.DomainSocketWatcher$2.run(DomainSocketWatcher.java:463) at java.lang.Thread.run(Thread.java:662) {code} The line 463 is following code snippet: {code} try { for (int fd : fdSet.getAndClearReadableFds()) { sendCallbackAndRemove(getAndClearReadableFds, entries, fdSet, fd); } {code} getAndClearReadableFds is a native method which will malloc an int array. Since our memory is very tight, it looks like the malloc failed and a NULL pointer is returned. The bad thing is that other threads then blocked in stack like this: {code} DataXceiver for client unix:/home/work/app/hdfs/c3prc-micloud/datanode/dn_socket [Waiting for operation #1] daemon prio=10 tid=0x7f0c9c086d90 nid=0x8fc3 waiting on condition [0x7f09b9856000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007b0174808 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:323) at org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.createNewMemorySegment(ShortCircuitRegistry.java:322) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.requestShortCircuitShm(DataXceiver.java:403) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRequestShortCircuitShm(Receiver.java:214) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:95) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) at java.lang.Thread.run(Thread.java:662) {code} IMO, we should exit the DN so that the users can know that something go wrong and fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14564971#comment-14564971 ] Kai Zheng commented on HDFS-8481: - Thanks Walter for the good ideas and Zhe for the update! 1. How about having {{RawDecoder decoder}} instead, in all places. Later we would easily change to use other decoder. {code} + private final RSRawDecoder rsRawDecoder; {code} 2. I guess only when data blocks are erased it will run into here to decode. 1) Should we count MISSING count and avoid too many blocks erased exception? 2) Do we need the {{else}} block? 3) Note the code format minor. {code} + } else if (chunk.state == StripingChunk.MISSING){ +decodeInputs[i] = null; + } else { +decodeInputs[i] = null; {code} 3. Around or in {{decodeAndFillBuffer}}, is it doable to use the source buffers as input buffers and destination buffers as the output buffers directly to avoid data copy? 4. I agree with Walter's concern, we would try to reuse related buffers or structures around and across the decode calling. For that, we might need to move them to and prepare them in the main class ({{DFSStripedInputStream}}) along with the decoder, or have a high level construct like {{StrippedDecoder}}. As it's non-trivial, I agree we can do this separately, but maybe in HDFS-7285? Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, HDFS-8481-HDFS-7285.03.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7609) startup used too much time to load edits
[ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565161#comment-14565161 ] Jing Zhao commented on HDFS-7609: - The 03 patch looks good to me. +1. I will commit it shortly. startup used too much time to load edits Key: HDFS-7609 URL: https://issues.apache.org/jira/browse/HDFS-7609 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Carrey Zhan Assignee: Ming Ma Labels: BB2015-05-RFC Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, recovery_do_not_use_retrycache.patch One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits
[ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7609: Summary: Avoid retry cache collision when Standby NameNode loading edits (was: startup used too much time to load edits) Avoid retry cache collision when Standby NameNode loading edits --- Key: HDFS-7609 URL: https://issues.apache.org/jira/browse/HDFS-7609 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Carrey Zhan Assignee: Ming Ma Labels: BB2015-05-RFC Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, recovery_do_not_use_retrycache.patch One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8322) Display warning if hadoop fs -ls is showing the local filesystem
[ https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-8322: Attachment: HDFS-8322.004.patch Thanks a lot, [~andrew.wang]. It is a great suggestion. I have modified the patch to address your comments. Display warning if hadoop fs -ls is showing the local filesystem Key: HDFS-8322 URL: https://issues.apache.org/jira/browse/HDFS-8322 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, HDFS-8322.004.patch Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs -ls}}. This JIRA proposes displaying a warning message if hadoop fs -ls is showing the local filesystem or using default fs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits
[ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7609: Issue Type: Bug (was: Improvement) Avoid retry cache collision when Standby NameNode loading edits --- Key: HDFS-7609 URL: https://issues.apache.org/jira/browse/HDFS-7609 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0 Reporter: Carrey Zhan Assignee: Ming Ma Priority: Critical Labels: BB2015-05-RFC Fix For: 2.8.0 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, recovery_do_not_use_retrycache.patch One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits
[ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7609: Priority: Critical (was: Major) Avoid retry cache collision when Standby NameNode loading edits --- Key: HDFS-7609 URL: https://issues.apache.org/jira/browse/HDFS-7609 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Carrey Zhan Assignee: Ming Ma Priority: Critical Labels: BB2015-05-RFC Fix For: 2.8.0 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, recovery_do_not_use_retrycache.patch One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits
[ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7609: Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. Thanks [~mingma] for the fix and [~CarreyZhan] for the report! And thanks to all for the discussion! Avoid retry cache collision when Standby NameNode loading edits --- Key: HDFS-7609 URL: https://issues.apache.org/jira/browse/HDFS-7609 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: Carrey Zhan Assignee: Ming Ma Labels: BB2015-05-RFC Fix For: 2.8.0 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, recovery_do_not_use_retrycache.patch One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8460) Erasure Coding: stateful read result doesn't match data occasionally because of flawed test
[ https://issues.apache.org/jira/browse/HDFS-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565129#comment-14565129 ] Jing Zhao commented on HDFS-8460: - We can use {{DataNodeTestUtil#setHeartbeatsDisabledForTests}} to disable the heartbeat. Other than this looks good to me. Erasure Coding: stateful read result doesn't match data occasionally because of flawed test --- Key: HDFS-8460 URL: https://issues.apache.org/jira/browse/HDFS-8460 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Yi Liu Assignee: Walter Su Attachments: HDFS-8460-HDFS-7285.001.patch I found this issue in TestDFSStripedInputStream, {{testStatefulRead}} failed occasionally shows that read result doesn't match data written. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565137#comment-14565137 ] Jesse Yates commented on HDFS-6440: --- Failed tests pass locally. Missed a whitespace in TestPipelinesFailover :( Could fix on commit, unless there are other comments on the latest version, in which case I'll wrap that into a new revision. Otherwise, i'd say this is go to go, [~atm]? Support more than 2 NameNodes - Key: HDFS-6440 URL: https://issues.apache.org/jira/browse/HDFS-6440 Project: Hadoop HDFS Issue Type: New Feature Components: auto-failover, ha, namenode Affects Versions: 2.4.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 3.0.0 Attachments: Multiple-Standby-NameNodes_V1.pdf, hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, hdfs-multiple-snn-trunk-v0.patch Most of the work is already done to support more than 2 NameNodes (one active, one standby). This would be the last bit to support running multiple _standby_ NameNodes; one of the standbys should be available for fail-over. Mostly, this is a matter of updating how we parse configurations, some complexity around managing the checkpointing, and updating a whole lot of tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8499) Merge BlockInfoUnderConstruction into trunk
Zhe Zhang created HDFS-8499: --- Summary: Merge BlockInfoUnderConstruction into trunk Key: HDFS-8499 URL: https://issues.apache.org/jira/browse/HDFS-8499 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang In HDFS-7285 branch, the {{BlockInfoUnderConstruction}} interface provides a common abstraction for striped and contiguous UC blocks. This JIRA aims to merge it to trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.
[ https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565684#comment-14565684 ] Lei (Eddy) Xu commented on HDFS-8322: - [~andrew.wang] I think {{TestWebDelegationToken}} is not relevant, I ran this test locally and successed. Display warning if defaultFs is not set when running dfs commands. -- Key: HDFS-8322 URL: https://issues.apache.org/jira/browse/HDFS-8322 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, HDFS-8322.004.patch, HDFS-8322.005.patch, HDFS-8322.006.patch Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs -ls}}. This JIRA proposes displaying a warning message if hadoop fs -ls is showing the local filesystem or using default fs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8420) Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path properly if zone dir itself is the snapshottable dir
[ https://issues.apache.org/jira/browse/HDFS-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565716#comment-14565716 ] Zhe Zhang commented on HDFS-8420: - Seems the change is already included in HDFS-8408 Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path properly if zone dir itself is the snapshottable dir -- Key: HDFS-8420 URL: https://issues.apache.org/jira/browse/HDFS-8420 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8320-HDFS-7285-00.patch, HDFS-8320-HDFS-7285-01.patch Presently the resultant zone dir will come with {{.snapshot}} only when the zone dir itself is snapshottable dir. It will return the path including the snapshot name like, {{/zone/.snapshot/snap1}}. Instead could improve this by returning only path {{/zone}}. Thanks [~vinayrpet] for the helpful [discussion|https://issues.apache.org/jira/browse/HDFS-8266?focusedCommentId=14543821page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14543821] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks
[ https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8489: Attachment: HDFS-8489.03.patch Thanks Jing for the comment! Yes the {{replaceBlock}} logic is also different with striping. Uploading new patch with the {{replaceBlock}} change. Subclass BlockInfo to represent contiguous blocks - Key: HDFS-8489 URL: https://issues.apache.org/jira/browse/HDFS-8489 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch, HDFS-8489.02.patch, HDFS-8489.03.patch As second step of the cleanup, we should make {{BlockInfo}} an abstract class and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The patch should clearly separate where to use the abstract class versus the subclass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565691#comment-14565691 ] Kai Zheng commented on HDFS-8481: - Good discussion here, thanks! bq. In that case we cannot reuse the source buffers I guess? Then do we need to expose this information in the decoder? Good catch Jing! Yes in this case we can't reuse the source buffers here as they need to be passed to caller/applications without being changed. I'm planning to re-implement the Java coders in HADOOP-12041 and related, when done it's possible to ensure the input buffers not to be affected. Benefits of doing this in coder layer: 1) a more clear contract between coder and caller in more general sense for the inputs; 2) concrete coder may have specific tweak to optimize in the aspect, ideally no input data copying at all, worst, make the copy, but all transparent to callers; 3) allow new coders (LRC, HH) to be layered on other primitive coders (RS, XOR) more easily. So for now let's forget the source buffers reusing here and we can do it in future, but do it for output buffers now if easy? Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, HDFS-8481-HDFS-7285.03.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks
[ https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8489: Attachment: HDFS-8489.02.patch Updating the patch to remove redundant logic between {{BlockInfo}} and {{BlockInfoContiguous}}. The main difference between {{BlockInfoStriped}} and {{BlockInfoContiguous}} is that in {{BIStriped#triplets}}, the first {{dataBlockNum}} slots are ordered based on internal block indices. Therefore the first {{dataBlockNum}} slots could have null, and we need an indices array to interpret the slots after {{dataBlockNum}}. So only {{addStorage}}, {{removeStorage}}, and {{numNodes}} should stay abstract in {{BlockInfo}} and be separately implemented. [~jingzhao] We discussed similar ideas under HDFS-7285 JIRAs. Let me know if the above makes sense to you. Subclass BlockInfo to represent contiguous blocks - Key: HDFS-8489 URL: https://issues.apache.org/jira/browse/HDFS-8489 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch, HDFS-8489.02.patch As second step of the cleanup, we should make {{BlockInfo}} an abstract class and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The patch should clearly separate where to use the abstract class versus the subclass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.
[ https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-8322: Attachment: HDFS-8322.006.patch Good findings, [~andrew.wang]. Updated accordingly. Display warning if defaultFs is not set when running dfs commands. -- Key: HDFS-8322 URL: https://issues.apache.org/jira/browse/HDFS-8322 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, HDFS-8322.004.patch, HDFS-8322.005.patch, HDFS-8322.006.patch Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs -ls}}. This JIRA proposes displaying a warning message if hadoop fs -ls is showing the local filesystem or using default fs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
[ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565561#comment-14565561 ] Colin Patrick McCabe commented on HDFS-7923: bq. Missing config key documentation in hdfs-defaults.xml added bq. requestBlockReportLeaseId: empty catch for unregistered node, we could add some more informative logging rather than relying on the warn below added bq. I discussed the NodeData structure with Colin offline, wondering why we didn't use a standard Collection. Colin brought up the reason of reducing garbage, which seems valid. I think we should consider implementing IntrusiveCollection though rather than writing another. yes, there will be quite a few of these requests coming in at any given point. IntrusiveCollection is an interface rather than an implementation, so I don't think it would help here (it's most useful when an element needs to be in multiple lists at once, and when you need fancy operations like finding the list from the element) bq. I also asked about putting NodeData into DatanodeDescriptor. Not sure what the conclusion was on this, it might reduce garbage since we don't need a separate NodeData object. The locking is easier to understand if all the lease data is inside {{BlockReportLeaseManager}}. bq. I prefer Precondition checks for invalid configuration values at startup, so there aren't any surprises for the user. Not everyone reads the messages on startup. ok bq. requestLease has a check for isTraceEnabled, then logs at debug level fixed bq. In offerService, we ignore the new leaseID if we already have one. On the NN though, a new request wipes out the old leaseID, and processReport checks based on leaseID rather than node. This kind of bug makes me wonder why we really need the leaseID at all, why not just attach a boolean to the node? Or if it's in the deferred vs. pending list? It's safer for the NameNode to wipe the old lease ID every time there is a new request. It avoids problems where the DN went down while holding a lease, and then came back up. We could potentially also avoid those problems by being very careful with node (un)registration, but why make things more complicated than they need to be? I do think that the DN should overwrite its old lease ID if the NN gives it a new one, for the same reason. Let me change it to do that... Of course this code path should never happen since the NN should never give a new lease ID when none was requested. So calling this a bug seems like a bit of a stretch. I prefer IDs to simply checking against the datanode UUID, because lease IDs allow us to match up the NN granting a lease with the DN accepting and using it, which is very useful for debugging or understanding what is happening in production. It also makes it very obvious whether a DN is cheating by sending a block report with leaseID = 0 to disable rate-limiting. This is a use-case we want to support but we also want to know when it is going on. bq. Can we fix the javadoc for scheduleBlockReport to mention randomness, and not send...at the next heartbeat? Incorrect right now. I looked pretty far back into the history of this code. It seems to go back to at least 2009. The underlying ideas seem to be: 1. the first full block report can have a configurable delay in seconds expressed by {{dfs.blockreport.initialDelay}} 2. the second full block report gets a random delay between 0 and {{dfs.blockreport.intervalMsec}} 3. all other block reports get an interval of {{dfs.blockreport.intervalMsec}} *unless* the previous block report had a longer interval than expected... if the previous one had a longer interval than expected, the next one gets a shorter interval. We can keep behavior #1... it's simple to implement and may be useful for testing (although I think this patch makes it no longer necessary). Behavior #2 seems like a workaround for the lack of congestion control in the past. In a world where the NN rate-limits full block reports, we don't need this behavior to prevent FBRs from clumping. They will just naturally not overly clump because we are rate-limiting them. Behavior #3 just seems incorrect, even without this patch. By definition, a full block report contains all the information the NN needs to understand the DN state. Just because block report interval N was longer than expected, seems no reason to shorten block report interval N+1. In fact, this behavior seems like it could lead to congestion collapse... if the NN gets overloaded and can't handle block reports for some time, a bunch of DNs will shorten the time in between the current block report and the next one, further increasing total NN load. Not good. Not good at all. I replaced this with a simple randomize first block report time within 0 and {{dfs.blockreport.initialDelay}}, then try to do all other
[jira] [Updated] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
[ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7923: --- Attachment: HDFS-7923.004.patch The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages --- Key: HDFS-7923 URL: https://issues.apache.org/jira/browse/HDFS-7923 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch The DataNodes should rate-limit their full block reports. They can do this by first sending a heartbeat message to the NN with an optional boolean set which requests permission to send a full block report. If the NN responds with another optional boolean set, the DN will send an FBR... if not, it will wait until later. This can be done compatibly with optional fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks
[ https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565556#comment-14565556 ] Jing Zhao commented on HDFS-8489: - Thanks for working on this, Zhe. Yes, addStorage, removeStorage, and numNodes should be abstract in BlockInfo. Besides, the block replacement logic can also be separated from {{BlocksMap#replaceBlock}} and becomes an abstract function in BlockInfo, as is done in the current EC feature branch. But this is optional. Subclass BlockInfo to represent contiguous blocks - Key: HDFS-8489 URL: https://issues.apache.org/jira/browse/HDFS-8489 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch, HDFS-8489.02.patch As second step of the cleanup, we should make {{BlockInfo}} an abstract class and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The patch should clearly separate where to use the abstract class versus the subclass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8487) Merge BlockInfo-related code changes from HDFS-7285 into trunk
[ https://issues.apache.org/jira/browse/HDFS-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8487: Description: Per offline discussion with [~andrew.wang], for easier and cleaner reviewing, we should probably shrink the size of the consolidated HDFS-7285 patch by merging some mechanical changes that are unrelated to EC-specific logic to trunk first. Those include renaming, subclassing, interfaces, and so forth. This umbrella JIRA specifically aims to merge code changes around {{BlockInfo}} and {{BlockInfoContiguous}} back into trunk. The structure of {{BlockInfo}} -related classes are shown below: {code} BlockInfo (abstract) / \ BlockInfoStriped BlockInfoContiguous || | BlockInfoUC | | (interface) | | / \ | BlockInfoStripedUC BlockInfoContiguousUC {code} was:Per offline discussion with [~andrew.wang], for easier and cleaner reviewing, we should probably shrink the size of the consolidated HDFS-7285 patch by merging some mechanical changes that are unrelated to EC-specific logic to trunk first. Those include renaming, subclassing, interfaces, and so forth. This umbrella JIRA specifically aims to merge code changes around {{BlockInfo}} and {{BlockInfoContiguous}} back into trunk. Merge BlockInfo-related code changes from HDFS-7285 into trunk -- Key: HDFS-8487 URL: https://issues.apache.org/jira/browse/HDFS-8487 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Per offline discussion with [~andrew.wang], for easier and cleaner reviewing, we should probably shrink the size of the consolidated HDFS-7285 patch by merging some mechanical changes that are unrelated to EC-specific logic to trunk first. Those include renaming, subclassing, interfaces, and so forth. This umbrella JIRA specifically aims to merge code changes around {{BlockInfo}} and {{BlockInfoContiguous}} back into trunk. The structure of {{BlockInfo}} -related classes are shown below: {code} BlockInfo (abstract) / \ BlockInfoStriped BlockInfoContiguous || | BlockInfoUC | | (interface) | | / \ | BlockInfoStripedUC BlockInfoContiguousUC {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8409) HDFS client RPC call throws java.lang.IllegalStateException
[ https://issues.apache.org/jira/browse/HDFS-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565649#comment-14565649 ] Juan Yu commented on HDFS-8409: --- If it happens at retry, not the initial call. For example, the initial call gets an exception after call object is created and sent. so it needs retry, but during retry, somehow it gets exception again and this time even before call object (should be same callId as the initial call) is created. In my patch, I added a test to simulate it, does it make sense? HDFS client RPC call throws java.lang.IllegalStateException - Key: HDFS-8409 URL: https://issues.apache.org/jira/browse/HDFS-8409 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Juan Yu Assignee: Juan Yu Attachments: HDFS-8409.001.patch, HDFS-8409.002.patch, HDFS-8409.003.patch When the HDFS client RPC calls need to retry, it sometimes throws java.lang.IllegalStateException and retry is aborted and cause the client call will fail. {code} Caused by: java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:116) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:99) at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1912) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1089) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400) {code} Here is the check that throws exception {code} public static void setCallIdAndRetryCount(int cid, int rc) { ... Preconditions.checkState(callId.get() == null); } {code} The RetryInvocationHandler tries to call it with not null callId and causes exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.
[ https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565664#comment-14565664 ] Hadoop QA commented on HDFS-8322: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 29s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 9m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 7s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 26s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 2m 7s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 39s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 0s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 24m 41s | Tests failed in hadoop-common. | | | | 70m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.security.token.delegation.web.TestWebDelegationToken | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736262/HDFS-8322.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3ae2a62 | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11167/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11167/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11167/console | This message was automatically generated. Display warning if defaultFs is not set when running dfs commands. -- Key: HDFS-8322 URL: https://issues.apache.org/jira/browse/HDFS-8322 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, HDFS-8322.004.patch, HDFS-8322.005.patch, HDFS-8322.006.patch Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs -ls}}. This JIRA proposes displaying a warning message if hadoop fs -ls is showing the local filesystem or using default fs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565708#comment-14565708 ] Kai Zheng commented on HDFS-8481: - bq. it is beneficial to accumulate multiple of them before sending to decode. Kai Zheng Could probably suggest a threshold size. In pure coder's point of view, yes it's good to have larger cell size. It's not clear yet in this case because the bottleneck might not be in the computation, instead in network traffic and data copying stuffs? My suggestion would be, if the accumulation is already available then we could have a default threshold value like 4MB but allowing it to be configurable in future; otherwise leave the accumulation optimization for future consideration at all. I would prefer not to do the accumulation in coder caller layer because it's hard. If it's good to have then we may do it in coder layer in one place, like having a {{BufferedRawErasureCoder}} layered on existing raw coders, transparent to callers. Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, HDFS-8481-HDFS-7285.03.patch, HDFS-8481-HDFS-7285.04.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.
[ https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565546#comment-14565546 ] Andrew Wang commented on HDFS-8322: --- I noticed you changed the name of the config parameter, and it's different in the code vs. core-default.xml. +1 pending fixing that though. Display warning if defaultFs is not set when running dfs commands. -- Key: HDFS-8322 URL: https://issues.apache.org/jira/browse/HDFS-8322 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, HDFS-8322.004.patch, HDFS-8322.005.patch Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs -ls}}. This JIRA proposes displaying a warning message if hadoop fs -ls is showing the local filesystem or using default fs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.
[ https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-8322: Attachment: HDFS-8322.005.patch Address checkstyle warning. The test failure is not relevant. Display warning if defaultFs is not set when running dfs commands. -- Key: HDFS-8322 URL: https://issues.apache.org/jira/browse/HDFS-8322 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, HDFS-8322.004.patch, HDFS-8322.005.patch Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs -ls}}. This JIRA proposes displaying a warning message if hadoop fs -ls is showing the local filesystem or using default fs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8409) HDFS client RPC call throws java.lang.IllegalStateException
[ https://issues.apache.org/jira/browse/HDFS-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565605#comment-14565605 ] Andrew Wang commented on HDFS-8409: --- Hey Juan, when would an exception before creation of a Call object not be a fatal error? HDFS client RPC call throws java.lang.IllegalStateException - Key: HDFS-8409 URL: https://issues.apache.org/jira/browse/HDFS-8409 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Juan Yu Assignee: Juan Yu Attachments: HDFS-8409.001.patch, HDFS-8409.002.patch, HDFS-8409.003.patch When the HDFS client RPC calls need to retry, it sometimes throws java.lang.IllegalStateException and retry is aborted and cause the client call will fail. {code} Caused by: java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.ipc.Client.setCallIdAndRetryCount(Client.java:116) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:99) at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1912) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1089) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400) {code} Here is the check that throws exception {code} public static void setCallIdAndRetryCount(int cid, int rc) { ... Preconditions.checkState(callId.get() == null); } {code} The RetryInvocationHandler tries to call it with not null callId and causes exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
[ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7923: --- Target Version/s: 2.8.0 Affects Version/s: 2.8.0 Status: Patch Available (was: In Progress) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages --- Key: HDFS-7923 URL: https://issues.apache.org/jira/browse/HDFS-7923 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch The DataNodes should rate-limit their full block reports. They can do this by first sending a heartbeat message to the NN with an optional boolean set which requests permission to send a full block report. If the NN responds with another optional boolean set, the DN will send an FBR... if not, it will wait until later. This can be done compatibly with optional fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
[ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565672#comment-14565672 ] Andrew Wang commented on HDFS-7923: --- Nits: * Should the checkLease logs be done to the blockLog? We log the startup error log there in processReport * Update javadoc in BlockReportContext with what leaseID is for. * Add something to the log message about overwriting the old leaseID in offerService. Agree that this shouldn't really trigger, but good defensive coding practice :) * DatanodeManager, there's still a register/unregister in registerDatanode I think we could skip. This is the node restart case where it's registered previously. * BRLManager requestLease, we auto-register the node on requestLease. This shouldn't happen since DNs need to register before doing anything else. We can keep this here * Still need documentation of new config keys in hdfs-default.xml Block report scheduling: * We removed TestBPSAScheduler#testScheduleBlockReportImmediate, should this swap over to testing forceFullBlockReport? * Extra import in TestBPSAScheduler and BPSA * I'm worried about convoy effects if we don't stick to the stride system of the old code. I think of the old code as follows: # Choose a random time within the initialDelay interval to jitter # Attempt to block report at that same time every hour. This keeps the BRs from all the DNs spread out, even if the NN gets temporarily backed up. Once the NN catches up and flushes its backlog of FBRs, future BRs will still be nicely spread out. My understanding of your new scheme is that after a DN successfully BRs, it'll BR again an hour afterwards. So, if all the BRs piled up and then are processed in quick succession, all the DNs will BR at about the same time next hour. Since we want to spread the BRs out across the hour, this is not good. Other ideas are to round up to the next stride. Or, wait an interval plus a random delay. We might consider some congestion control too, where the DNs backoff linearly or exponentially. All these schemes delay the FBRs, but maybe we trust IBRs enough now. If you want to pursue this logic change more, let's split it out into a follow-on JIRA. The rest LGTM, +1 pending above comments. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages --- Key: HDFS-7923 URL: https://issues.apache.org/jira/browse/HDFS-7923 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch The DataNodes should rate-limit their full block reports. They can do this by first sending a heartbeat message to the NN with an optional boolean set which requests permission to send a full block report. If the NN responds with another optional boolean set, the DN will send an FBR... if not, it will wait until later. This can be done compatibly with optional fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8481: Attachment: HDFS-8481-HDFS-7285.04.patch Thanks Kai for verifying this. I'm attaching 04 patch to address minor issues above. To address the GC issue we should also avoid filling 0 bytes. Maybe the codec can support a special flag to mark an input slot as all-zero? I'm currently working on reusing the input/output buffers. It turns out tricky because 1) we need to change all byte arrays to {{ByteBuffer} and 2) we need a better abstraction to divide the rounds of {{decode()}} aligned at cell boundaries. Perhaps something like a {{StripedDecoder}}. If the 04 patch looks OK for removing decoding workaround, how about we commit it first while we work on the various tasks discussed above to reuse all input and output buffers? Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, HDFS-8481-HDFS-7285.03.patch, HDFS-8481-HDFS-7285.04.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8254) In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer
[ https://issues.apache.org/jira/browse/HDFS-8254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565722#comment-14565722 ] Zhe Zhang commented on HDFS-8254: - bq. I think it won't be an issue. Cause MultipleBlockingQueue.poll(..) has synchronized(queues) Yes good point. I'm OK with leaving {{locateFollowingBlock}} as-is in this JIRA but we can think about moving it to the coordinator for cleaner flow. In StripedDataStreamer, it is hard to tolerate datanode failure in the leading streamer --- Key: HDFS-8254 URL: https://issues.apache.org/jira/browse/HDFS-8254 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h8254_20150526.patch, h8254_20150526b.patch StripedDataStreamer javadoc is shown below. {code} * The StripedDataStreamer class is used by {@link DFSStripedOutputStream}. * There are two kinds of StripedDataStreamer, leading streamer and ordinary * stream. Leading streamer requests a block group from NameNode, unwraps * it to located blocks and transfers each located block to its corresponding * ordinary streamer via a blocking queue. {code} Leading streamer is the streamer with index 0. When the datanode of the leading streamer fails, the other steamers cannot continue since no one will request a block group from NameNode anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565721#comment-14565721 ] Kai Zheng commented on HDFS-8481: - bq. To address the GC issue we should also avoid filling 0 bytes. Maybe the codec can support a special flag to mark an input slot as all-zero? Good idea! It's easy to add such flag in {{ECChunk}} and we can use the following version API: {code} public void decode(ECChunk[] inputs, int[] erasedIndexes, ECChunk[] outputs); {code} bq. how about we commit it first while we work on the various tasks discussed above to reuse all input and output buffers? I'm OK with this approach. Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, HDFS-8481-HDFS-7285.03.patch, HDFS-8481-HDFS-7285.04.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks
[ https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8489: Attachment: HDFS-8489.04.patch Both {{TestFileTruncate}} and {{TestAppendSnapshotTruncate}} pass locally. Uploading new patch with 2 changes to address check style issues: # Remove 2 unused imports from {{BlocksMap}} # Add a period to a Javadoc in {{BlockInfo}} Subclass BlockInfo to represent contiguous blocks - Key: HDFS-8489 URL: https://issues.apache.org/jira/browse/HDFS-8489 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch, HDFS-8489.02.patch, HDFS-8489.03.patch, HDFS-8489.04.patch As second step of the cleanup, we should make {{BlockInfo}} an abstract class and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The patch should clearly separate where to use the abstract class versus the subclass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks
[ https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565794#comment-14565794 ] Hadoop QA commented on HDFS-8489: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 10 new or modified test files. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 37s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 12s | The applied patch generated 5 new checkstyle issues (total was 692, now 692). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 17s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 163m 2s | Tests failed in hadoop-hdfs. | | | | 209m 10s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestAppendSnapshotTruncate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736270/HDFS-8489.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6aec13c | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11168/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11168/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11168/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11168/console | This message was automatically generated. Subclass BlockInfo to represent contiguous blocks - Key: HDFS-8489 URL: https://issues.apache.org/jira/browse/HDFS-8489 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch, HDFS-8489.02.patch, HDFS-8489.03.patch As second step of the cleanup, we should make {{BlockInfo}} an abstract class and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The patch should clearly separate where to use the abstract class versus the subclass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8489) Subclass BlockInfo to represent contiguous blocks
[ https://issues.apache.org/jira/browse/HDFS-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565770#comment-14565770 ] Hadoop QA commented on HDFS-8489: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 43s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 10 new or modified test files. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 14s | The applied patch generated 2 new checkstyle issues (total was 687, now 684). | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 32s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 14s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 162m 39s | Tests failed in hadoop-hdfs. | | | | 208m 42s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736254/HDFS-8489.02.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7673d4f | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11166/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11166/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11166/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11166/console | This message was automatically generated. Subclass BlockInfo to represent contiguous blocks - Key: HDFS-8489 URL: https://issues.apache.org/jira/browse/HDFS-8489 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8489.00.patch, HDFS-8489.01.patch, HDFS-8489.02.patch, HDFS-8489.03.patch As second step of the cleanup, we should make {{BlockInfo}} an abstract class and merge the subclass {{BlockInfoContiguous}} from HDFS-7285 into trunk. The patch should clearly separate where to use the abstract class versus the subclass. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8420) Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path properly if zone dir itself is the snapshottable dir
[ https://issues.apache.org/jira/browse/HDFS-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565806#comment-14565806 ] Rakesh R commented on HDFS-8420: bq. Seems the change is already included in HDFS-8408 Thanks [~zhz] for your time and taking a look at this issue. Yes, I agree with you. Also, thank you [~vinayrpet] for incorporating this case in HDFS-8408 Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path properly if zone dir itself is the snapshottable dir -- Key: HDFS-8420 URL: https://issues.apache.org/jira/browse/HDFS-8420 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8320-HDFS-7285-00.patch, HDFS-8320-HDFS-7285-01.patch Presently the resultant zone dir will come with {{.snapshot}} only when the zone dir itself is snapshottable dir. It will return the path including the snapshot name like, {{/zone/.snapshot/snap1}}. Instead could improve this by returning only path {{/zone}}. Thanks [~vinayrpet] for the helpful [discussion|https://issues.apache.org/jira/browse/HDFS-8266?focusedCommentId=14543821page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14543821] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8420) Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path properly if zone dir itself is the snapshottable dir
[ https://issues.apache.org/jira/browse/HDFS-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8420: --- Resolution: Duplicate Status: Resolved (was: Patch Available) Erasure Coding: ECZoneManager#getECZoneInfo is not resolving the path properly if zone dir itself is the snapshottable dir -- Key: HDFS-8420 URL: https://issues.apache.org/jira/browse/HDFS-8420 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8320-HDFS-7285-00.patch, HDFS-8320-HDFS-7285-01.patch Presently the resultant zone dir will come with {{.snapshot}} only when the zone dir itself is snapshottable dir. It will return the path including the snapshot name like, {{/zone/.snapshot/snap1}}. Instead could improve this by returning only path {{/zone}}. Thanks [~vinayrpet] for the helpful [discussion|https://issues.apache.org/jira/browse/HDFS-8266?focusedCommentId=14543821page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14543821] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7923) The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages
[ https://issues.apache.org/jira/browse/HDFS-7923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565784#comment-14565784 ] Hadoop QA commented on HDFS-7923: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 18m 13s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 15 new or modified test files. | | {color:green}+1{color} | javac | 9m 5s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 11m 19s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 25s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 36s | The applied patch generated 25 new checkstyle issues (total was 1365, now 1380). | | {color:red}-1{color} | whitespace | 0m 9s | The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 53s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 40s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 43s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 35s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 100m 55s | Tests failed in hadoop-hdfs. | | | | 152m 39s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.TestNamenodeCapacityReport | | | hadoop.hdfs.TestSetrepDecreasing | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736276/HDFS-7923.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 6aec13c | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/whitespace.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11169/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11169/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11169/console | This message was automatically generated. The DataNodes should rate-limit their full block reports by asking the NN on heartbeat messages --- Key: HDFS-7923 URL: https://issues.apache.org/jira/browse/HDFS-7923 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.8.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-7923.000.patch, HDFS-7923.001.patch, HDFS-7923.002.patch, HDFS-7923.003.patch, HDFS-7923.004.patch The DataNodes should rate-limit their full block reports. They can do this by first sending a heartbeat message to the NN with an optional boolean set which requests permission to send a full block report. If the NN responds with another optional boolean set, the DN will send an FBR... if not, it will wait until later. This can be done compatibly with optional fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8450) Erasure Coding: Consolidate erasure coding zone related implementation into a single class
[ https://issues.apache.org/jira/browse/HDFS-8450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565813#comment-14565813 ] Rakesh R commented on HDFS-8450: [~drankye] I hope I've addressed your comments. Could you please review the patch again when you get a chance. Thanks! Erasure Coding: Consolidate erasure coding zone related implementation into a single class -- Key: HDFS-8450 URL: https://issues.apache.org/jira/browse/HDFS-8450 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8450-HDFS-7285-00.patch, HDFS-8450-HDFS-7285-01.patch, HDFS-8450-HDFS-7285-02.patch, HDFS-8450-HDFS-7285-03.patch The idea is to follow the same pattern suggested by HDFS-7416. It is good to consolidate all the erasure coding zone related implementations of {{FSNamesystem}}. Here, proposing {{FSDirErasureCodingZoneOp}} class to have functions to perform related erasure coding zone operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565822#comment-14565822 ] Hadoop QA commented on HDFS-8481: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 44s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 55s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 14s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 38s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 38s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 28s | The patch appears to introduce 1 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 20s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 172m 36s | Tests failed in hadoop-hdfs. | | | | 215m 59s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.TestAppendSnapshotTruncate | | | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.TestRecoverStripedFile | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.blockmanagement.TestBlockInfo | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736292/HDFS-8481-HDFS-7285.04.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / 1299357 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11170/artifact/patchprocess/patchReleaseAuditProblems.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11170/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11170/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11170/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11170/console | This message was automatically generated. Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, HDFS-8481-HDFS-7285.03.patch, HDFS-8481-HDFS-7285.04.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565224#comment-14565224 ] Jing Zhao commented on HDFS-8481: - Thanks for working on this, Zhe! I agree that we should reuse the source buffers if possible. One question for [~drankye] is, in the javadoc of decoder, it is mentioned that some decoder may change the content of the input. In that case we cannot reuse the source buffers I guess? Then do we need to expose this information in the decoder? Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, HDFS-8481-HDFS-7285.03.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565259#comment-14565259 ] Zhe Zhang commented on HDFS-8481: - Thanks for the comment Jing. I guess need some smart policy here because we don't want to feed the decoder with very small buffers either. For example, if cell size is small, like 16KB, it is beneficial to accumulate multiple of them before sending to decode. [~drankye] Could probably suggest a threshold size. Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, HDFS-8481-HDFS-7285.03.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.
[ https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565408#comment-14565408 ] Hadoop QA commented on HDFS-8322: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 16m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 46s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 13s | The applied patch generated 1 new checkstyle issues (total was 190, now 191). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 54s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:red}-1{color} | common tests | 23m 11s | Tests failed in hadoop-common. | | | | 62m 48s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.security.token.delegation.web.TestWebDelegationToken | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12736192/HDFS-8322.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 7817674 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11165/artifact/patchprocess/diffcheckstylehadoop-common.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11165/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11165/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11165/console | This message was automatically generated. Display warning if defaultFs is not set when running dfs commands. -- Key: HDFS-8322 URL: https://issues.apache.org/jira/browse/HDFS-8322 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, HDFS-8322.004.patch Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs -ls}}. This JIRA proposes displaying a warning message if hadoop fs -ls is showing the local filesystem or using default fs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits
[ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565335#comment-14565335 ] Hudson commented on HDFS-7609: -- FAILURE: Integrated in Hadoop-trunk-Commit #7926 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7926/]) HDFS-7609. Avoid retry cache collision when Standby NameNode loading edits. Contributed by Ming Ma. (jing9: rev 7817674a3a4d097b647dd77f1345787dd376d5ea) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRetryCacheWithHA.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Avoid retry cache collision when Standby NameNode loading edits --- Key: HDFS-7609 URL: https://issues.apache.org/jira/browse/HDFS-7609 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0 Reporter: Carrey Zhan Assignee: Ming Ma Priority: Critical Fix For: 2.8.0 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, recovery_do_not_use_retrycache.patch One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits
[ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565353#comment-14565353 ] Ming Ma commented on HDFS-7609: --- Thanks Jing and all other folks. Avoid retry cache collision when Standby NameNode loading edits --- Key: HDFS-7609 URL: https://issues.apache.org/jira/browse/HDFS-7609 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0 Reporter: Carrey Zhan Assignee: Ming Ma Priority: Critical Fix For: 2.8.0 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, recovery_do_not_use_retrycache.patch One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8322) Display warning if defaultFs is not set when running dfs commands.
[ https://issues.apache.org/jira/browse/HDFS-8322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-8322: Summary: Display warning if defaultFs is not set when running dfs commands. (was: Display warning if hadoop fs -ls is showing the local filesystem) Display warning if defaultFs is not set when running dfs commands. -- Key: HDFS-8322 URL: https://issues.apache.org/jira/browse/HDFS-8322 Project: Hadoop HDFS Issue Type: Improvement Components: HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Minor Attachments: HDFS-8322.000.patch, HDFS-8322.001.patch, HDFS-8322.002.patch, HDFS-8322.003.patch, HDFS-8322.003.patch, HDFS-8322.004.patch Using {{LocalFileSystem}} is rarely the intention of running {{hadoop fs -ls}}. This JIRA proposes displaying a warning message if hadoop fs -ls is showing the local filesystem or using default fs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7609) Avoid retry cache collision when Standby NameNode loading edits
[ https://issues.apache.org/jira/browse/HDFS-7609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7609: Labels: (was: BB2015-05-RFC) Avoid retry cache collision when Standby NameNode loading edits --- Key: HDFS-7609 URL: https://issues.apache.org/jira/browse/HDFS-7609 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.2.0 Reporter: Carrey Zhan Assignee: Ming Ma Priority: Critical Fix For: 2.8.0 Attachments: HDFS-7609-2.patch, HDFS-7609-3.patch, HDFS-7609-CreateEditsLogWithRPCIDs.patch, HDFS-7609.patch, recovery_do_not_use_retrycache.patch One day my namenode crashed because of two journal node timed out at the same time under very high load, leaving behind about 100 million transactions in edits log.(I still have no idea why they were not rolled into fsimage.) I tryed to restart namenode, but it showed that almost 20 hours would be needed before finish, and it was loading fsedits most of the time. I also tryed to restart namenode in recover mode, the loading speed had no different. I looked into the stack trace, judged that it is caused by the retry cache. So I set dfs.namenode.enable.retrycache to false, the restart process finished in half an hour. I think the retry cached is useless during startup, at least during recover process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565447#comment-14565447 ] Jing Zhao commented on HDFS-8481: - Yes, to have this kind of accumulation will be great. But looks to me this will be mainly a performance optimization. Not reusing user buffer may cause more serious issue as Walter described and should be more critical to fix. Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, HDFS-8481-HDFS-7285.03.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8463) Calling DFSInputStream.seekToNewSource just after stream creation causes NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-8463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565454#comment-14565454 ] Kihwal Lee commented on HDFS-8463: -- It might be better to simply call {{blockSeekTo(targetPos)}} and return true, if {{currentNode}} is null. Calling DFSInputStream.seekToNewSource just after stream creation causes NullPointerException -- Key: HDFS-8463 URL: https://issues.apache.org/jira/browse/HDFS-8463 Project: Hadoop HDFS Issue Type: Bug Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Priority: Minor Attachments: HDFS-8463.001.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8481) Erasure coding: remove workarounds in client side stripped blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565456#comment-14565456 ] Zhe Zhang commented on HDFS-8481: - I agree; the GC issue for wide stripes is more serious. I will revise the patch to directly use {{buf}} and we can do the accumulation optimization under HDFS-8031. Erasure coding: remove workarounds in client side stripped blocks recovering Key: HDFS-8481 URL: https://issues.apache.org/jira/browse/HDFS-8481 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8481-HDFS-7285.00.patch, HDFS-8481-HDFS-7285.01.patch, HDFS-8481-HDFS-7285.02.patch, HDFS-8481-HDFS-7285.03.patch After HADOOP-11847 and related fixes, we should be able to properly calculate decoded contents. -- This message was sent by Atlassian JIRA (v6.3.4#6332)