[jira] [Commented] (HDFS-8483) Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block.
[ https://issues.apache.org/jira/browse/HDFS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629092#comment-14629092 ] Hadoop QA commented on HDFS-8483: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745565/HDFS-8483.0.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 3ec0a04 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11724/console | This message was automatically generated. Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block. -- Key: HDFS-8483 URL: https://issues.apache.org/jira/browse/HDFS-8483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Takanobu Asanuma Assignee: Takanobu Asanuma Fix For: HDFS-7285 Attachments: HDFS-8483.0.patch We can mimic one/several DataNode(s) reporting bad block(s) (which belong to a striped block) to the NameNode (through the DatanodeProtocol#reportBadBlocks call), and check if the recovery/invalidation work can be correctly scheduled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinwei Qin updated HDFS-8202: -- Attachment: HDFS-8202-HDFS-7285.003.patch [~zhz], updated the patch including reading and writing EC file test with failure, please help to review. Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8202-HDFS-7285.003.patch, HDFS-8202.001.patch, HDFS-8202.002.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8483) Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block.
[ https://issues.apache.org/jira/browse/HDFS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-8483: --- Status: Patch Available (was: Open) Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block. -- Key: HDFS-8483 URL: https://issues.apache.org/jira/browse/HDFS-8483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Takanobu Asanuma Assignee: Takanobu Asanuma Fix For: HDFS-7285 Attachments: HDFS-8483.0.patch We can mimic one/several DataNode(s) reporting bad block(s) (which belong to a striped block) to the NameNode (through the DatanodeProtocol#reportBadBlocks call), and check if the recovery/invalidation work can be correctly scheduled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8483) Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block.
[ https://issues.apache.org/jira/browse/HDFS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-8483: --- Attachment: HDFS-8483.0.patch I uploaded an initial patch. Please review it. Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block. -- Key: HDFS-8483 URL: https://issues.apache.org/jira/browse/HDFS-8483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Takanobu Asanuma Assignee: Takanobu Asanuma Fix For: HDFS-7285 Attachments: HDFS-8483.0.patch We can mimic one/several DataNode(s) reporting bad block(s) (which belong to a striped block) to the NameNode (through the DatanodeProtocol#reportBadBlocks call), and check if the recovery/invalidation work can be correctly scheduled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8787) Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be consistent with trunk
[ https://issues.apache.org/jira/browse/HDFS-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8787: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7285 Status: Resolved (was: Patch Available) Thanks Jing for reviewing! {{TestEditLog}} passes fine locally. I just committed the patch to EC branch. Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be consistent with trunk --- Key: HDFS-8787 URL: https://issues.apache.org/jira/browse/HDFS-8787 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang Fix For: HDFS-7285 Attachments: HDFS-8787-HDFS-7285.00.patch As Nicholas suggested under HDFS-8728, we should split the patch on {{BlockInfo}} structure into smaller pieces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8767) RawLocalFileSystem.listStatus() returns null for UNIX pipefile
[ https://issues.apache.org/jira/browse/HDFS-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629184#comment-14629184 ] kanaka kumar avvaru commented on HDFS-8767: --- Looks fine for me. Thanks for updating the test [~wheat9]. I will post patch for pending check-style. RawLocalFileSystem.listStatus() returns null for UNIX pipefile -- Key: HDFS-8767 URL: https://issues.apache.org/jira/browse/HDFS-8767 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: kanaka kumar avvaru Priority: Critical Attachments: HDFS-8767-00.patch, HDFS-8767-01.patch, HDFS-8767-02.patch, HDFS-8767.003.patch Calling FileSystem.listStatus() on a UNIX pipe file returns null instead of the file. The bug breaks Hive when Hive loads data from UNIX pipe file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8688) replace shouldCheckForEnoughRacks with hasClusterEverBeenMultiRack
[ https://issues.apache.org/jira/browse/HDFS-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629210#comment-14629210 ] Ming Ma commented on HDFS-8688: --- Thanks [~walter.k.su]! Overall it looks good. {{ScriptBasedMapping#isSingleSwitch}} still checks if the script name is null. But it appears that method is only used by test codes. Does that mean {{AbstractDNSToSwitchMapping#isSingleSwitch}} isn't necessary anymore? Try to understand if all the the script name == null implies single rack assumptions in the code can be removed. replace shouldCheckForEnoughRacks with hasClusterEverBeenMultiRack -- Key: HDFS-8688 URL: https://issues.apache.org/jira/browse/HDFS-8688 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Walter Su Assignee: Walter Su Attachments: HDFS-8688.01.patch, HDFS-8688.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7613) Block placement policy for erasure coding groups
[ https://issues.apache.org/jira/browse/HDFS-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629070#comment-14629070 ] Ming Ma commented on HDFS-7613: --- Interesting work. There are couple issues around the extensibility of block placement policies. They aren't EC specific; so we don't have to tackle them here. But we would like to raise them here if it helps to make future refactoring easier. * Balancer and Mover have built-in assumption it is the default block placement policy. So every time we have a new block placement policy, we need to modify those tools. One suggestion mentioned in HDFS-1431 is to run Balancer and Mover inside NN. * BlockManager has built-in assumption about rack policy in functions such as useDelHint, blockHasEnoughRacks. That means when we have new block placement policy, we need to modify BlockManager to account for the new policy. HDFS-8647 should improve that. * Ability to reuse or compose new policies based on existing block placement policies. This is different from HDFS-4894 to support different policies for different files. For example, HDFS-7541 adds upgrade domain policy. It will be nice if we can support both upgrade domain policy and EC policy without any code change at run time for a given file. https://issues.apache.org/jira/secure/attachment/12687808/SupportforfastHDFSdatanoderollingupgrade.pdf's Support for nonÂtopology based policy section suggested a more flexible API. * As we have new policies, migration from old policy to new policy on production clusters becomes necessary. [~ctrezzo] has worked in this area and plan to open a new jira for that. Block placement policy for erasure coding groups Key: HDFS-7613 URL: https://issues.apache.org/jira/browse/HDFS-7613 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Walter Su Attachments: HDFS-7613.001.patch Blocks in an erasure coding group should be placed in different failure domains -- different DataNodes at the minimum, and different racks ideally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8202) Improve end to end stirpping file test to add erasure recovering test
[ https://issues.apache.org/jira/browse/HDFS-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629094#comment-14629094 ] Xinwei Qin commented on HDFS-8202: --- Hi, [~zhz], thanks for your clarify. I will move HDFS-8259 and HDFS-8260 patch here. Improve end to end stirpping file test to add erasure recovering test - Key: HDFS-8202 URL: https://issues.apache.org/jira/browse/HDFS-8202 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Xinwei Qin Attachments: HDFS-8202.001.patch, HDFS-8202.002.patch This to follow on HDFS-8201 to add erasure recovering test in the end to end stripping file test: * After writing certain blocks to the test file, delete some block file; * Read the file content and compare, see if any recovering issue, or verify the erasure recovering works or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8776) Decom manager should not be active on standby
[ https://issues.apache.org/jira/browse/HDFS-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629236#comment-14629236 ] Ming Ma commented on HDFS-8776: --- Make sense. There might be some operational impact with disabling DecommissionManager on standby. admins usually update dfs.namenode.hosts.exclude and then call dfsadmin -refreshNodes on both active and standby around the same time; in that way if NN fails over, decomm can continue. If DecommissionManager isn't running on standby, nodes will stay in decommission_inprogress state without any progress on standby. As long as admins know to ignore decommission state on standby, that should be ok (even if we keep DecommissionManager running, decommission states between active and standby could be different at any given time). Decom manager should not be active on standby - Key: HDFS-8776 URL: https://issues.apache.org/jira/browse/HDFS-8776 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Daryn Sharp Assignee: Daryn Sharp The decommission manager should not be actively processing on the standby. The decomm manager goes through the costly computation for determining every block on the node requires replication yet doesn't queue them for replication - because it's in standby. The decomm manager is holding the namesystem write lock, causing DNs to timeout on heartbeats or IBRs, NN purges the call queue of timed out clients, NN processes some heartbeats/IBRs before the decomm manager locks up the namesystem again. Nodes attempting to register will be sending full BRs which are more costly to send and discard than a heartbeat. If a failover is required, the standby will likely have to struggle very hard to not GC while catching up on its queued IBRs while DNs continue to fill the call queue and time out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8313) Erasure Coding: DFSStripedOutputStream#close throws NullPointerException exception in some cases
[ https://issues.apache.org/jira/browse/HDFS-8313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo resolved HDFS-8313. - Resolution: Cannot Reproduce Erasure Coding: DFSStripedOutputStream#close throws NullPointerException exception in some cases Key: HDFS-8313 URL: https://issues.apache.org/jira/browse/HDFS-8313 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Li Bo {code} java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hdfs.DataStreamer$LastException.check(DataStreamer.java:193) at org.apache.hadoop.hdfs.DFSStripedOutputStream.closeImpl(DFSStripedOutputStream.java:422) {code} DFSStripedOutputStream#close throws NullPointerException exception in some cases -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8670) Better to exclude decommissioned nodes for namenode NodeUsage JMX
[ https://issues.apache.org/jira/browse/HDFS-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629123#comment-14629123 ] J.Andreina commented on HDFS-8670: -- Testcase failures are not related to this patch. Better to exclude decommissioned nodes for namenode NodeUsage JMX - Key: HDFS-8670 URL: https://issues.apache.org/jira/browse/HDFS-8670 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: J.Andreina Attachments: HDFS-8670.1.patch, HDFS-8670.2.patch, HDFS-8670.3.patch The namenode NodeUsage JMX has Max, Median, Min and Standard Deviation of DataNodes usage, it currently includes decommissioned nodes for the calculation. However, given balancer doesn't work on decommissioned nodes and sometimes we could have nodes stay in decommissioned states for a long time; it might be better to exclude decommissioned nodes for the metrics calculation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8784) BlockInfo#numNodes should be numStorages
[ https://issues.apache.org/jira/browse/HDFS-8784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanaka kumar avvaru reassigned HDFS-8784: - Assignee: kanaka kumar avvaru BlockInfo#numNodes should be numStorages Key: HDFS-8784 URL: https://issues.apache.org/jira/browse/HDFS-8784 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: kanaka kumar avvaru The method actually returns the number of storages holding a block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8670) Better to exclude decommissioned nodes for namenode NodeUsage JMX
[ https://issues.apache.org/jira/browse/HDFS-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629217#comment-14629217 ] Ming Ma commented on HDFS-8670: --- Thanks [~andreina]. Overall it looks good. The two test cases have quite amount of overlaps. It seems the only differences between decommissioned test case and decommission_inprogress test case are the number of datanodes and the expected decommission state the test cases wait for. Maybe these two test cases can call a common function that takes number of datanodes and expected decommission state as parameters? Better to exclude decommissioned nodes for namenode NodeUsage JMX - Key: HDFS-8670 URL: https://issues.apache.org/jira/browse/HDFS-8670 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: J.Andreina Attachments: HDFS-8670.1.patch, HDFS-8670.2.patch, HDFS-8670.3.patch The namenode NodeUsage JMX has Max, Median, Min and Standard Deviation of DataNodes usage, it currently includes decommissioned nodes for the calculation. However, given balancer doesn't work on decommissioned nodes and sometimes we could have nodes stay in decommissioned states for a long time; it might be better to exclude decommissioned nodes for the metrics calculation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8787) Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be consistent with trunk
[ https://issues.apache.org/jira/browse/HDFS-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629075#comment-14629075 ] Hadoop QA commented on HDFS-8787: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 28s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 12 new or modified test files. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 30s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 37s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 3s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 23s | The patch appears to introduce 5 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 16s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 169m 53s | Tests failed in hadoop-hdfs. | | | | 212m 45s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.server.namenode.TestFileTruncate | | Timed out tests | org.apache.hadoop.hdfs.server.namenode.TestEditLog | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745540/HDFS-8787-HDFS-7285.00.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / 7e091de | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11723/artifact/patchprocess/patchReleaseAuditProblems.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11723/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11723/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11723/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11723/console | This message was automatically generated. Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be consistent with trunk --- Key: HDFS-8787 URL: https://issues.apache.org/jira/browse/HDFS-8787 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8787-HDFS-7285.00.patch As Nicholas suggested under HDFS-8728, we should split the patch on {{BlockInfo}} structure into smaller pieces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8483) Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block.
[ https://issues.apache.org/jira/browse/HDFS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-8483: --- Status: Open (was: Patch Available) Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block. -- Key: HDFS-8483 URL: https://issues.apache.org/jira/browse/HDFS-8483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Takanobu Asanuma Assignee: Takanobu Asanuma Fix For: HDFS-7285 Attachments: HDFS-8483.0.patch We can mimic one/several DataNode(s) reporting bad block(s) (which belong to a striped block) to the NameNode (through the DatanodeProtocol#reportBadBlocks call), and check if the recovery/invalidation work can be correctly scheduled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6697) Make NN lease soft and hard limits configurable
[ https://issues.apache.org/jira/browse/HDFS-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-6697: - Attachment: HDFS-6697.2.patch Updated the patch. Please review. Make NN lease soft and hard limits configurable --- Key: HDFS-6697 URL: https://issues.apache.org/jira/browse/HDFS-6697 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: J.Andreina Attachments: HDFS-6697.1.patch, HDFS-6697.2.patch For testing, NameNodeAdapter allows test code to specify lease soft and hard limit via setLeasePeriod directly on LeaseManager. But NamenodeProxies.java still use the default values. It is useful if we can make NN lease soft and hard limit configurable via Configuration. That will allow NamenodeProxies.java to use the configured values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8697) Refactor DecommissionManager: more generic method names and misc cleanup
[ https://issues.apache.org/jira/browse/HDFS-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629011#comment-14629011 ] Hadoop QA commented on HDFS-8697: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 56s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 8m 2s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 21s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 27s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 18s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 37s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 10s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 159m 58s | Tests failed in hadoop-hdfs. | | | | 205m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestInitializeSharedEdits | | | hadoop.hdfs.TestDistributedFileSystem | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745524/HDFS-8697.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3ec0a04 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11719/artifact/patchprocess/whitespace.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11719/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11719/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11719/console | This message was automatically generated. Refactor DecommissionManager: more generic method names and misc cleanup Key: HDFS-8697 URL: https://issues.apache.org/jira/browse/HDFS-8697 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8697.00.patch, HDFS-8697.01.patch This JIRA merges the changes in {{DecommissionManager}} from the HDFS-7285 branch, including changing a few method names to be more generic ({{replicated}} - {{stored}}), and some cleanups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8483) Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block.
[ https://issues.apache.org/jira/browse/HDFS-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629017#comment-14629017 ] Takanobu Asanuma commented on HDFS-8483: I'm going to submit a patch today or tommorow. Thanks for working about HDFS-8619. Erasure coding: test DataNode reporting bad/corrupted blocks which belongs to a striped block. -- Key: HDFS-8483 URL: https://issues.apache.org/jira/browse/HDFS-8483 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Takanobu Asanuma Assignee: Takanobu Asanuma Fix For: HDFS-7285 We can mimic one/several DataNode(s) reporting bad block(s) (which belong to a striped block) to the NameNode (through the DatanodeProtocol#reportBadBlocks call), and check if the recovery/invalidation work can be correctly scheduled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7443) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume
[ https://issues.apache.org/jira/browse/HDFS-7443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7443: -- Labels: 2.6.1-candidate (was: ) Datanode upgrade to BLOCKID_BASED_LAYOUT fails if duplicate block files are present in the same volume -- Key: HDFS-7443 URL: https://issues.apache.org/jira/browse/HDFS-7443 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Kihwal Lee Assignee: Colin Patrick McCabe Priority: Blocker Labels: 2.6.1-candidate Fix For: 2.6.1 Attachments: HDFS-7443.001.patch, HDFS-7443.002.patch When we did an upgrade from 2.5 to 2.6 in a medium size cluster, about 4% of datanodes were not coming up. They treid data file layout upgrade for BLOCKID_BASED_LAYOUT introduced in HDFS-6482, but failed. All failures were caused by {{NativeIO.link()}} throwing IOException saying {{EEXIST}}. The data nodes didn't die right away, but the upgrade was soon retried when the block pool initialization was retried whenever {{BPServiceActor}} was registering with the namenode. After many retries, datenodes terminated. This would leave {{previous.tmp}} and {{current}} with no {{VERSION}} file in the block pool slice storage directory. Although {{previous.tmp}} contained the old {{VERSION}} file, the content was in the new layout and the subdirs were all newly created ones. This shouldn't have happened because the upgrade-recovery logic in {{Storage}} removes {{current}} and renames {{previous.tmp}} to {{current}} before retrying. All successfully upgraded volumes had old state preserved in their {{previous}} directory. In summary there were two observed issues. - Upgrade failure with {{link()}} failing with {{EEXIST}} - {{previous.tmp}} contained not the content of original {{current}}, but half-upgraded one. We did not see this in smaller scale test clusters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7489) Incorrect locking in FsVolumeList#checkDirs can hang datanodes
[ https://issues.apache.org/jira/browse/HDFS-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7489: -- Labels: 2.6.1-candidate (was: ) Incorrect locking in FsVolumeList#checkDirs can hang datanodes -- Key: HDFS-7489 URL: https://issues.apache.org/jira/browse/HDFS-7489 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.5.0, 2.6.0 Reporter: Noah Lorang Assignee: Noah Lorang Priority: Critical Labels: 2.6.1-candidate Fix For: 2.6.1 Attachments: HDFS-7489-v1.patch, HDFS-7489-v2.patch, HDFS-7489-v2.patch.1 Starting after upgrading to 2.5.0 (CDH 5.2.1), we started to see datanodes hanging their heartbeat and requests from clients. After some digging, I identified the culprit as being the checkDiskError() triggered by catching IOExceptions (in our case, SocketExceptions being triggered on one datanode by ReplicaAlreadyExistsExceptions on another datanode). Thread dumps reveal that the checkDiskErrors() thread is holding a lock on the FsVolumeList: {code} Thread-409 daemon prio=10 tid=0x7f4e50200800 nid=0x5b8e runnable [0x7f4e2f855000] java.lang.Thread.State: RUNNABLE at java.io.UnixFileSystem.list(Native Method) at java.io.File.list(File.java:973) at java.io.File.listFiles(File.java:1051) at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:89) at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91) at org.apache.hadoop.util.DiskChecker.checkDirs(DiskChecker.java:91) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.checkDirs(BlockPoolSlice.java:257) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.checkDirs(FsVolumeImpl.java:210) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.checkDirs(FsVolumeList.java:180) - locked 0x00063b182ea0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkDataDir(FsDatasetImpl.java:1396) at org.apache.hadoop.hdfs.server.datanode.DataNode$5.run(DataNode.java:2832) at java.lang.Thread.run(Thread.java:662) {code} Other things would then lock the FsDatasetImpl while waiting for the FsVolumeList, e.g.: {code} DataXceiver for client at /10.10.0.52:46643 [Receiving block BP-1573746465-127.0.1.1-1352244533715:blk_1073770670_106962574] daemon prio=10 tid=0x7f4e55561000 nid=0x406d waiting for monitor entry [0x7f4e3106d000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.getNextVolume(FsVolumeList.java:64) - waiting to lock 0x00063b182ea0 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:927) - locked 0x00063b1f9a48 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:101) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.init(BlockReceiver.java:167) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:604) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:126) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:72) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225) at java.lang.Thread.run(Thread.java:662) {code} That lock on the FsDatasetImpl then causes other threads to block: {code} Thread-127 daemon prio=10 tid=0x7f4e4c67d800 nid=0x2e02 waiting for monitor entry [0x7f4e3339] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hdfs.server.datanode.BlockSender.init(BlockSender.java:228) - waiting to lock 0x00063b1f9a48 (a org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyBlock(BlockPoolSliceScanner.java:436) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.verifyFirstBlock(BlockPoolSliceScanner.java:523) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner.scan(BlockPoolSliceScanner.java:684) at
[jira] [Updated] (HDFS-7425) NameNode block deletion logging uses incorrect appender.
[ https://issues.apache.org/jira/browse/HDFS-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7425: -- Labels: 2.6.1-candidate (was: ) NameNode block deletion logging uses incorrect appender. Key: HDFS-7425 URL: https://issues.apache.org/jira/browse/HDFS-7425 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Labels: 2.6.1-candidate Fix For: 2.6.1 Attachments: HDFS-7425-branch-2.1.patch The NameNode uses 2 separate Log4J appenders for tracking state changes. The appenders are named org.apache.hadoop.hdfs.StateChange and BlockStateChange. The intention of BlockStateChange is to separate more verbose block state change logging and allow it to be configured separately. In branch-2, there is some block state change logging that incorrectly goes to the org.apache.hadoop.hdfs.StateChange appender though. The bug is not present in trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7503) Namenode restart after large deletions can cause slow processReport (due to logging)
[ https://issues.apache.org/jira/browse/HDFS-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7503: -- Labels: 2.6.1-candidate (was: ) Namenode restart after large deletions can cause slow processReport (due to logging) Key: HDFS-7503 URL: https://issues.apache.org/jira/browse/HDFS-7503 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.1, 2.6.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Labels: 2.6.1-candidate Fix For: 1.3.0, 2.6.1 Attachments: HDFS-7503.branch-1.02.patch, HDFS-7503.branch-1.patch, HDFS-7503.trunk.01.patch, HDFS-7503.trunk.02.patch If a large directory is deleted and namenode is immediately restarted, there are a lot of blocks that do not belong to any file. This results in a log: {code} 2014-11-08 03:11:45,584 INFO BlockStateChange (BlockManager.java:processReport(1901)) - BLOCK* processReport: blk_1074250282_509532 on 172.31.44.17:1019 size 6 does not belong to any file. {code} This log is printed within FSNamsystem lock. This can cause namenode to take long time in coming out of safemode. One solution is to downgrade the logging level. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7575) Upgrade should generate a unique storage ID for each volume
[ https://issues.apache.org/jira/browse/HDFS-7575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7575: -- Labels: 2.6.1-candidate (was: ) Upgrade should generate a unique storage ID for each volume --- Key: HDFS-7575 URL: https://issues.apache.org/jira/browse/HDFS-7575 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0, 2.5.0, 2.6.0 Reporter: Lars Francke Assignee: Arpit Agarwal Priority: Critical Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: HDFS-7575.01.patch, HDFS-7575.02.patch, HDFS-7575.03.binary.patch, HDFS-7575.03.patch, HDFS-7575.04.binary.patch, HDFS-7575.04.patch, HDFS-7575.05.binary.patch, HDFS-7575.05.patch, testUpgrade22via24GeneratesStorageIDs.tgz, testUpgradeFrom22GeneratesStorageIDs.tgz, testUpgradeFrom24PreservesStorageId.tgz Before HDFS-2832 each DataNode would have a unique storageId which included its IP address. Since HDFS-2832 the DataNodes have a unique storageId per storage directory which is just a random UUID. They send reports per storage directory in their heartbeats. This heartbeat is processed on the NameNode in the {{DatanodeDescriptor#updateHeartbeatState}} method. Pre HDFS-2832 this would just store the information per Datanode. After the patch though each DataNode can have multiple different storages so it's stored in a map keyed by the storage Id. This works fine for all clusters that have been installed post HDFS-2832 as they get a UUID for their storage Id. So a DN with 8 drives has a map with 8 different keys. On each Heartbeat the Map is searched and updated ({{DatanodeStorageInfo storage = storageMap.get(s.getStorageID());}}): {code:title=DatanodeStorageInfo} void updateState(StorageReport r) { capacity = r.getCapacity(); dfsUsed = r.getDfsUsed(); remaining = r.getRemaining(); blockPoolUsed = r.getBlockPoolUsed(); } {code} On clusters that were upgraded from a pre HDFS-2832 version though the storage Id has not been rewritten (at least not on the four clusters I checked) so each directory will have the exact same storageId. That means there'll be only a single entry in the {{storageMap}} and it'll be overwritten by a random {{StorageReport}} from the DataNode. This can be seen in the {{updateState}} method above. This just assigns the capacity from the received report, instead it should probably sum it up per received heartbeat. The Balancer seems to be one of the only things that actually uses this information so it now considers the utilization of a random drive per DataNode for balancing purposes. Things get even worse when a drive has been added or replaced as this will now get a new storage Id so there'll be two entries in the storageMap. As new drives are usually empty it skewes the balancers decision in a way that this node will never be considered over-utilized. Another problem is that old StorageReports are never removed from the storageMap. So if I replace a drive and it gets a new storage Id the old one will still be in place and used for all calculations by the Balancer until a restart of the NameNode. I can try providing a patch that does the following: * Instead of using a Map I could just store the array we receive or instead of storing an array sum up the values for reports with the same Id * On each heartbeat clear the map (so we know we have up to date information) Does that sound sensible? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7579) Improve log reporting during block report rpc failure
[ https://issues.apache.org/jira/browse/HDFS-7579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7579: -- Labels: 2.6.1-candidate supportability (was: supportability) Improve log reporting during block report rpc failure - Key: HDFS-7579 URL: https://issues.apache.org/jira/browse/HDFS-7579 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.7.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Labels: 2.6.1-candidate, supportability Fix For: 2.7.0 Attachments: HDFS-7579.000.patch, HDFS-7579.001.patch During block reporting, if the block report RPC fails, for example because it exceeded the max rpc len, we should still produce some sort of LOG.info output to help with debugging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7733) NFS: readdir/readdirplus return null directory attribute on failure
[ https://issues.apache.org/jira/browse/HDFS-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7733: -- Labels: 2.6.1-candidate (was: ) NFS: readdir/readdirplus return null directory attribute on failure --- Key: HDFS-7733 URL: https://issues.apache.org/jira/browse/HDFS-7733 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.6.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Labels: 2.6.1-candidate Fix For: 2.6.1 Attachments: HDFS-7733.01.patch NFS readdir and readdirplus operations return a null directory attribute on some failure paths. This causes clients to get a 'Stale file handle' error which can only be fixed by unmounting and remounting the share. The issue can be reproduced by running 'ls' against a large directory which is being actively modified, triggering the 'cookie mismatch' failure path. {code} } else { LOG.error(cookieverf mismatch. request cookieverf: + cookieVerf + dir cookieverf: + dirStatus.getModificationTime()); return new READDIRPLUS3Response(Nfs3Status.NFS3ERR_BAD_COOKIE); } {code} Thanks to [~brandonli] for catching the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7596) NameNode should prune dead storages from storageMap
[ https://issues.apache.org/jira/browse/HDFS-7596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7596: -- Labels: 2.6.1-candidate (was: ) NameNode should prune dead storages from storageMap --- Key: HDFS-7596 URL: https://issues.apache.org/jira/browse/HDFS-7596 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: HDFS-7596.01.patch, HDFS-7596.02.patch The NameNode must be able to prune storages that are no longer reported by the DataNode and that have no blocks associated. These stale storages can skew the balancer behavior. Detailed discussion on HDFS-7575. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7885) Datanode should not trust the generation stamp provided by client
[ https://issues.apache.org/jira/browse/HDFS-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7885: -- Labels: 2.6.1-candidate (was: ) Datanode should not trust the generation stamp provided by client - Key: HDFS-7885 URL: https://issues.apache.org/jira/browse/HDFS-7885 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: vitthal (Suhas) Gogate Assignee: Tsz Wo Nicholas Sze Priority: Critical Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: h7885_20150305.patch, h7885_20150306.patch Datanode should not trust the generation stamp provided by client, since it is prefetched and buffered in client, and concurrent append may increase it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7831) Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks()
[ https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7831: -- Labels: 2.6.1-candidate (was: ) Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks() Key: HDFS-7831 URL: https://issues.apache.org/jira/browse/HDFS-7831 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: HDFS-7831-01.patch Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted in [Jing's comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8072) Reserved RBW space is not released if client terminates while writing block
[ https://issues.apache.org/jira/browse/HDFS-8072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-8072: -- Labels: 2.6.1-candidate (was: ) Reserved RBW space is not released if client terminates while writing block --- Key: HDFS-8072 URL: https://issues.apache.org/jira/browse/HDFS-8072 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: HDFS-8072.01.patch, HDFS-8072.02.patch The DataNode reserves space for a full block when creating an RBW block (introduced in HDFS-6898). The reserved space is released incrementally as data is written to disk and fully when the block is finalized. However if the client process terminates unexpectedly mid-write then the reserved space is not released until the DN is restarted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8127) NameNode Failover during HA upgrade can cause DataNode to finalize upgrade
[ https://issues.apache.org/jira/browse/HDFS-8127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-8127: -- Labels: 2.6.1-candidate (was: ) NameNode Failover during HA upgrade can cause DataNode to finalize upgrade -- Key: HDFS-8127 URL: https://issues.apache.org/jira/browse/HDFS-8127 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Jing Zhao Assignee: Jing Zhao Priority: Blocker Labels: 2.6.1-candidate Fix For: 2.7.1 Attachments: HDFS-8127.000.patch, HDFS-8127.001.patch Currently for HA upgrade (enabled by HDFS-5138), we use {{-bootstrapStandby}} to initialize the standby NameNode. The standby NameNode does not have the {{previous}} directory thus it does not know that the cluster is in the upgrade state. If NN failover happens, as response of block reports, the new ANN will tell DNs to finalize the upgrade thus make it impossible to rollback again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7960) The full block report should prune zombie storages even if they're not empty
[ https://issues.apache.org/jira/browse/HDFS-7960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-7960: -- Labels: 2.6.1-candidate (was: ) The full block report should prune zombie storages even if they're not empty Key: HDFS-7960 URL: https://issues.apache.org/jira/browse/HDFS-7960 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Colin Patrick McCabe Priority: Critical Labels: 2.6.1-candidate Fix For: 2.7.0 Attachments: HDFS-7960.002.patch, HDFS-7960.003.patch, HDFS-7960.004.patch, HDFS-7960.005.patch, HDFS-7960.006.patch, HDFS-7960.007.patch, HDFS-7960.008.patch The full block report should prune zombie storages even if they're not empty. We have seen cases in production where zombie storages have not been pruned subsequent to HDFS-7575. This could arise any time the NameNode thinks there is a block in some old storage which is actually not there. In this case, the block will not show up in the new storage (once old is renamed to new) and the old storage will linger forever as a zombie, even with the HDFS-7596 fix applied. This also happens with datanode hotplug, when a drive is removed. In this case, an entire storage (volume) goes away but the blocks do not show up in another storage on the same datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8760) Erasure Coding: reuse BlockReader when reading the same block in pread
[ https://issues.apache.org/jira/browse/HDFS-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8760: Attachment: HDFS-8760.000.patch Erasure Coding: reuse BlockReader when reading the same block in pread -- Key: HDFS-8760 URL: https://issues.apache.org/jira/browse/HDFS-8760 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8760.000.patch Currently in pread, we create a new block reader for each aligned stripe even though these stripes belong to the same block. It's better to reuse them to avoid unnecessary block reader creation overhead. This can also avoid reading from the same bad DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8760) Erasure Coding: reuse BlockReader when reading the same block in pread
[ https://issues.apache.org/jira/browse/HDFS-8760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8760: Attachment: (was: HDFS-8760.000.patch) Erasure Coding: reuse BlockReader when reading the same block in pread -- Key: HDFS-8760 URL: https://issues.apache.org/jira/browse/HDFS-8760 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8760.000.patch Currently in pread, we create a new block reader for each aligned stripe even though these stripes belong to the same block. It's better to reuse them to avoid unnecessary block reader creation overhead. This can also avoid reading from the same bad DataNode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8783) enable socket timeout for balancer's target connection
[ https://issues.apache.org/jira/browse/HDFS-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629036#comment-14629036 ] Hadoop QA commented on HDFS-8783: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 7s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 34s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 40s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 34s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 1s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 160m 57s | Tests failed in hadoop-hdfs. | | | | 201m 50s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestDistributedFileSystem | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745489/HDFS-8783.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3ec0a04 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11722/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11722/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11722/console | This message was automatically generated. enable socket timeout for balancer's target connection -- Key: HDFS-8783 URL: https://issues.apache.org/jira/browse/HDFS-8783 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: HDFS-8783.patch Have met a real case when the balancer connected to a black hole target datanode which accepted connection but not sent any response back, then balancer got hung -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7483) Display information per tier on the Namenode UI
[ https://issues.apache.org/jira/browse/HDFS-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629045#comment-14629045 ] Benoy Antony commented on HDFS-7483: [~wheat9], As I mentioned, there is no good way of displaying percentage using math helper and fmt_percentage filter. If you have no further comments, I'll commit this patch by end of day tomorrow. Display information per tier on the Namenode UI --- Key: HDFS-7483 URL: https://issues.apache.org/jira/browse/HDFS-7483 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-7483-001.patch, HDFS-7483-002.patch, overview.png, storagetypes.png, storagetypes_withnostorage.png, withOneStorageType.png, withTwoStorageType.png If cluster has different types of storage, it is useful to display the storage information per type. The information will be available via JMX (HDFS-7390) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8728) Erasure coding: revisit and simplify BlockInfoStriped and INodeFile
[ https://issues.apache.org/jira/browse/HDFS-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629048#comment-14629048 ] Zhe Zhang commented on HDFS-8728: - Good point Nicholas. I should perhaps change the title of the JIRA. The latest patch pretty much is for the purpose of merging HDFS-8499 to the branch. But as part of the merging we need to change the {{BIStriped} logic, which needs additional review in the context of the branch. Erasure coding: revisit and simplify BlockInfoStriped and INodeFile --- Key: HDFS-8728 URL: https://issues.apache.org/jira/browse/HDFS-8728 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8728-HDFS-7285.00.patch, HDFS-8728-HDFS-7285.01.patch, HDFS-8728-HDFS-7285.02.patch, HDFS-8728-HDFS-7285.03.patch, HDFS-8728.00.patch, HDFS-8728.01.patch, HDFS-8728.02.patch, Merge-1-codec.patch, Merge-2-ecZones.patch, Merge-3-blockInfo.patch, Merge-4-blockmanagement.patch, Merge-5-blockPlacementPolicies.patch, Merge-6-locatedStripedBlock.patch, Merge-7-replicationMonitor.patch, Merge-8-inodeFile.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
[ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627671#comment-14627671 ] Walter Su commented on HDFS-8704: - Hi [~libo-intel]! The test failed. Seems like the issue still exists. Could you update the patch? This jira has higher priority. Thanks. Erasure Coding: client fails to write large file when one datanode fails Key: HDFS-8704 URL: https://issues.apache.org/jira/browse/HDFS-8704 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8704-000.patch I test current code on a 5-node cluster using RS(3,2). When a datanode is corrupt, client succeeds to write a file smaller than a block group but fails to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests files smaller than a block group, this jira will add more test situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8779) WebUI can't display randomly generated block ID
[ https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8779: Status: Patch Available (was: Open) WebUI can't display randomly generated block ID --- Key: HDFS-8779 URL: https://issues.apache.org/jira/browse/HDFS-8779 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8779.01.patch Old release use randomly generated block ID(HDFS-4645). max value of Long in Java is 2^63-1 max value of number in Javascript is 2^53-1. ( See [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]) Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER. A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8344) NameNode doesn't recover lease for files with missing blocks
[ https://issues.apache.org/jira/browse/HDFS-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627801#comment-14627801 ] Masatake Iwasaki commented on HDFS-8344: Hi, [~raviprak]. {code} 67private int recoveryAttemptsBeforeMarkingBlockMissing = 5; {code} Should this be configurable? I think infinite is conservative and preferable default value in order to avoid data loss and keep current behavior. 5 could be used as threshold to show warning message as [~kihwal] suggested. NameNode doesn't recover lease for files with missing blocks Key: HDFS-8344 URL: https://issues.apache.org/jira/browse/HDFS-8344 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.7.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: HDFS-8344.01.patch, HDFS-8344.02.patch, HDFS-8344.03.patch, HDFS-8344.04.patch, HDFS-8344.05.patch I found another\(?) instance in which the lease is not recovered. This is reproducible easily on a pseudo-distributed single node cluster # Before you start it helps if you set. This is not necessary, but simply reduces how long you have to wait {code} public static final long LEASE_SOFTLIMIT_PERIOD = 30 * 1000; public static final long LEASE_HARDLIMIT_PERIOD = 2 * LEASE_SOFTLIMIT_PERIOD; {code} # Client starts to write a file. (could be less than 1 block, but it hflushed so some of the data has landed on the datanodes) (I'm copying the client code I am using. I generate a jar and run it using $ hadoop jar TestHadoop.jar) # Client crashes. (I simulate this by kill -9 the $(hadoop jar TestHadoop.jar) process after it has printed Wrote to the bufferedWriter # Shoot the datanode. (Since I ran on a pseudo-distributed cluster, there was only 1) I believe the lease should be recovered and the block should be marked missing. However this is not happening. The lease is never recovered. The effect of this bug for us was that nodes could not be decommissioned cleanly. Although we knew that the client had crashed, the Namenode never released the leases (even after restarting the Namenode) (even months afterwards). There are actually several other cases too where we don't consider what happens if ALL the datanodes die while the file is being written, but I am going to punt on that for another time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8779) WebUI can't display randomly generated block ID
Walter Su created HDFS-8779: --- Summary: WebUI can't display randomly generated block ID Key: HDFS-8779 URL: https://issues.apache.org/jira/browse/HDFS-8779 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Walter Su Assignee: Walter Su Priority: Minor Old release use randomly generated block ID(HDFS-4645). max value of Long in Java is 2^63-1 max value of number in Javascript is 2^53-1. ( See [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]) Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER. A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8762) Erasure Coding: the log of each streamer should show its index
[ https://issues.apache.org/jira/browse/HDFS-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627809#comment-14627809 ] Hadoop QA commented on HDFS-8762: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 15m 11s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 33s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 47s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 15s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 36s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 1s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 27s | The patch appears to introduce 6 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 15s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 174m 24s | Tests failed in hadoop-hdfs. | | | | 216m 44s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.server.datanode.TestTransferRbw | | | hadoop.hdfs.server.namenode.TestFileTruncate | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745387/HDFS-8762-HDFS-7285-001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / 0a93712 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11709/artifact/patchprocess/patchReleaseAuditProblems.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11709/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11709/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11709/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11709/console | This message was automatically generated. Erasure Coding: the log of each streamer should show its index -- Key: HDFS-8762 URL: https://issues.apache.org/jira/browse/HDFS-8762 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8762-HDFS-7285-001.patch The log in {{DataStreamer}} doesn't show which streamer it's generated from. In order to make log information more convenient for debugging, each log should include the index of the streamer it's generated from. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2
[ https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627827#comment-14627827 ] Duo Zhang commented on HDFS-7966: - Small read using {{PerformanceTest}}. Unit is millisecond. {noformat} ./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 1(thread number) 100(read count per thread) 1024(bytes per read) pread(use pread) {noformat} {noformat} ./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 1 100 1024 pread *** time based on tcp 242730 ./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 1 100 1024 pread *** time based on http2 324491 ./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 10 10 1024 pread *** time based on tcp 40688 ./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 10 10 1024 pread *** time based on http2 82819 ./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 100 1 1024 pread *** time based on tcp 21612 ./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 100 1 1024 pread *** time based on http2 69658 ./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 500 2000 1024 pread *** time based on tcp 19931 ./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 500 2000 1024 pread *** time based on http2 151727 ./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 1000 1000 1024 pread *** time based on http2 251735 {noformat} For the single threaded test, 324491/242730=1.34, so http2 is 30% slow than tcp. Will try to find the overhead later. And for multi threaded test, http2 is much slow than tcp. And tcp failed the 1000 threads test. I think the problem is that I only use one connection in http2 so there is only one EventLoop(which means only one thread) which sends or receives data. And for tcp, the thread number is same with connection number. The {{%CPU}} of datanode when using http2 is always around 100% no matter the thread number is 10 or 100 or 1000. But when using tcp the {{%CPU}} could be higher than 1500% when the number of thread increasing. Next I will write new test which can use multiple http2 connections. Thanks. New Data Transfer Protocol via HTTP/2 - Key: HDFS-7966 URL: https://issues.apache.org/jira/browse/HDFS-7966 Project: Hadoop HDFS Issue Type: New Feature Reporter: Haohui Mai Assignee: Qianqian Shi Labels: gsoc, gsoc2015, mentor Attachments: GSoC2015_Proposal.pdf, TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg The current Data Transfer Protocol (DTP) implements a rich set of features that span across multiple layers, including: * Connection pooling and authentication (session layer) * Encryption (presentation layer) * Data writing pipeline (application layer) All these features are HDFS-specific and defined by implementation. As a result it requires non-trivial amount of work to implement HDFS clients and servers. This jira explores to delegate the responsibilities of the session and presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles connection multiplexing, QoS, authentication and encryption, reducing the scope of DTP to the application layer only. By leveraging the existing HTTP/2 library, it should simplify the implementation of both HDFS clients and servers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8058) Erasure coding: use BlockInfo[] for both striped and contiguous blocks in INodeFile
[ https://issues.apache.org/jira/browse/HDFS-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8058: Attachment: HDFS-8058-HDFS-7285.010.patch Thanks Jing for noticing the {{TestDFSStripedOutputStreamWithFailure}} timeout. It turns out to be a tricky bug. It happened between 06 and 07 patch. Basically I forgot to carry over the {{setFileReplication((short) 0)}} logic in the new {{INodeFile}} constructor. New patch addresses this issue: {code} // Replication factor for striped files is zero if (isStriped) { h = REPLICATION.BITS.combine(0L, h); h = IS_STRIPED.BITS.combine(1L, h); } else { h = REPLICATION.BITS.combine(replication, h); h = IS_STRIPED.BITS.combine(0L, h); } {code} Erasure coding: use BlockInfo[] for both striped and contiguous blocks in INodeFile --- Key: HDFS-8058 URL: https://issues.apache.org/jira/browse/HDFS-8058 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Yi Liu Assignee: Zhe Zhang Attachments: HDFS-8058-HDFS-7285.003.patch, HDFS-8058-HDFS-7285.004.patch, HDFS-8058-HDFS-7285.005.patch, HDFS-8058-HDFS-7285.006.patch, HDFS-8058-HDFS-7285.007.patch, HDFS-8058-HDFS-7285.008.patch, HDFS-8058-HDFS-7285.009.patch, HDFS-8058-HDFS-7285.010.patch, HDFS-8058.001.patch, HDFS-8058.002.patch This JIRA is to use {{BlockInfo[] blocks}} for both striped and contiguous blocks in INodeFile. Currently {{FileWithStripedBlocksFeature}} keeps separate list for striped blocks, and the methods there duplicate with those in INodeFile, and current code need to judge {{isStriped}} then do different things. Also if file is striped, the {{blocks}} in INodeFile occupy a reference memory space. These are not necessary, and we can use the same {{blocks}} to make code more clear. I keep {{FileWithStripedBlocksFeature}} as empty for follow use: I will file a new JIRA to move {{dataBlockNum}} and {{parityBlockNum}} from *BlockInfoStriped* to INodeFile, since ideally they are the same for all striped blocks in a file, and store them in block will waste NN memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8768) Erasure Coding: block group ID displayed in WebUI is not consistent with fsck
[ https://issues.apache.org/jira/browse/HDFS-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8768: Attachment: screen-shot-with-HDFS-8779-patch.PNG Erasure Coding: block group ID displayed in WebUI is not consistent with fsck - Key: HDFS-8768 URL: https://issues.apache.org/jira/browse/HDFS-8768 Project: Hadoop HDFS Issue Type: Sub-task Reporter: GAO Rui Attachments: Screen Shot 2015-07-14 at 15.33.08.png, screen-shot-with-HDFS-8779-patch.PNG For example, In WebUI( usually, namenode port: 50070) , one Erasure Code file with one block group was displayed as the attached screenshot [^Screen Shot 2015-07-14 at 15.33.08.png]. But, with fsck command, the block group of the same file was displayed like: {{0. BP-1130999596-172.23.38.10-1433791629728:blk_-9223372036854740160_3384 len=6438256640}} After checking block file names in datanodes, we believe WebUI may have some problem with Erasure Code block group display. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8762) Erasure Coding: the log of each streamer should show its index
[ https://issues.apache.org/jira/browse/HDFS-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627666#comment-14627666 ] Li Bo commented on HDFS-8762: - Add {{this}} to log string is also a solution. But it adds too much to the log string. I think only the index is enough. Some log strings are generated in static function, I have to change them to non-static, is there any better idea of this problem? Erasure Coding: the log of each streamer should show its index -- Key: HDFS-8762 URL: https://issues.apache.org/jira/browse/HDFS-8762 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8762-HDFS-7285-001.patch The log in {{DataStreamer}} doesn't show which streamer it's generated from. In order to make log information more convenient for debugging, each log should include the index of the streamer it's generated from. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8704) Erasure Coding: client fails to write large file when one datanode fails
[ https://issues.apache.org/jira/browse/HDFS-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627677#comment-14627677 ] Li Bo commented on HDFS-8704: - I am still working on this jira. The error is random and now it succeeds in most times. I still need several days to get it totally works well. Erasure Coding: client fails to write large file when one datanode fails Key: HDFS-8704 URL: https://issues.apache.org/jira/browse/HDFS-8704 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8704-000.patch I test current code on a 5-node cluster using RS(3,2). When a datanode is corrupt, client succeeds to write a file smaller than a block group but fails to write a large one. {{TestDFSStripeOutputStreamWithFailure}} only tests files smaller than a block group, this jira will add more test situations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8779) WebUI can't display randomly generated block ID
[ https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8779: Attachment: HDFS-8779.01.patch WebUI can't display randomly generated block ID --- Key: HDFS-8779 URL: https://issues.apache.org/jira/browse/HDFS-8779 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8779.01.patch Old release use randomly generated block ID(HDFS-4645). max value of Long in Java is 2^63-1 max value of number in Javascript is 2^53-1. ( See [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]) Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER. A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8716) introduce a new config specifically for safe mode block count
[ https://issues.apache.org/jira/browse/HDFS-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627589#comment-14627589 ] Hadoop QA commented on HDFS-8716: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 17m 3s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 37s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 44s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 20s | The applied patch generated 1 new checkstyle issues (total was 676, now 676). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 20s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 4s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 160m 56s | Tests failed in hadoop-hdfs. | | | | 204m 32s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestAppendSnapshotTruncate | | | hadoop.hdfs.server.namenode.ha.TestDNFencing | | | hadoop.hdfs.TestDistributedFileSystem | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745376/HDFS-8716.7.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0a16ee6 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11708/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11708/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11708/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11708/console | This message was automatically generated. introduce a new config specifically for safe mode block count - Key: HDFS-8716 URL: https://issues.apache.org/jira/browse/HDFS-8716 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: HDFS-8716.1.patch, HDFS-8716.2.patch, HDFS-8716.3.patch, HDFS-8716.4.patch, HDFS-8716.5.patch, HDFS-8716.6.patch, HDFS-8716.7.patch, HDFS-8716.7.patch During the start up, namenode waits for n replicas of each block to be reported by datanodes before exiting the safe mode. Currently n is tied to the min replicas config. We could set min replicas to more than one but we might want to exit safe mode as soon as each block has one replica reported. This can be worked out by introducing a new config variable for safe mode block count -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8762) Erasure Coding: the log of each streamer should show its index
[ https://issues.apache.org/jira/browse/HDFS-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-8762: Status: Patch Available (was: Open) Erasure Coding: the log of each streamer should show its index -- Key: HDFS-8762 URL: https://issues.apache.org/jira/browse/HDFS-8762 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8762-HDFS-7285-001.patch The log in {{DataStreamer}} doesn't show which streamer it's generated from. In order to make log information more convenient for debugging, each log should include the index of the streamer it's generated from. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8762) Erasure Coding: the log of each streamer should show its index
[ https://issues.apache.org/jira/browse/HDFS-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-8762: Attachment: HDFS-8762-HDFS-7285-001.patch Erasure Coding: the log of each streamer should show its index -- Key: HDFS-8762 URL: https://issues.apache.org/jira/browse/HDFS-8762 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Li Bo Assignee: Li Bo Attachments: HDFS-8762-HDFS-7285-001.patch The log in {{DataStreamer}} doesn't show which streamer it's generated from. In order to make log information more convenient for debugging, each log should include the index of the streamer it's generated from. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8619) Erasure Coding: revisit replica counting for striped blocks
[ https://issues.apache.org/jira/browse/HDFS-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627656#comment-14627656 ] Walter Su commented on HDFS-8619: - LGTM. +1 Erasure Coding: revisit replica counting for striped blocks --- Key: HDFS-8619 URL: https://issues.apache.org/jira/browse/HDFS-8619 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-8619-HDFS-7285.001.patch, HDFS-8619.000.patch Currently we use the same {{BlockManager#countNodes}} method for striped blocks, which simply treat each internal block as a replica. However, for a striped block, we may have more complicated scenario, e.g., we have multiple replicas of the first internal block while we miss some other internal blocks. Using the current {{countNodes}} methods can lead to wrong decision in these scenarios. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8784) BlockInfo#numNodes should be numStorages
Zhe Zhang created HDFS-8784: --- Summary: BlockInfo#numNodes should be numStorages Key: HDFS-8784 URL: https://issues.apache.org/jira/browse/HDFS-8784 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang The method actually returns the number of storages holding a block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8694) Expose the stats of IOErrors on each FsVolume through JMX
[ https://issues.apache.org/jira/browse/HDFS-8694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628371#comment-14628371 ] Lei (Eddy) Xu commented on HDFS-8694: - Thanks for the reviews, [~andrew.wang] bq. I have a hard time understanding when we should call handle the disk error vs. just bubbling up, since it bubbles there seems like a danger of handling the same root IOE more than once. What's the methodology here? Is it possible to move handling to the top-level somewhere? I can manually examine all the current callsites and callers, but that's not very future-proof. The reason that call {{volume#handleIOErrors()}} is that when the {{IOE}} pops up to the place we used to call {{DataNode#checkDiskErrorAsync()}}, the context (IOs on which volume) is usually missing. My intention was to call {{volume#handleIOErrors()}} at the highest level that manages {{volume}} object lifetime. I will try to get rid of {{DataNode#checkDiskErrorAsync()}} call in a following JIRA. bq. Since we now have the volume as context, we should really move the disk checker to be per-volume rather than DN wide. One volume throwing an error is no reason to check all of them. This can be deferred to a follow-up; I think it's a slam dunk. Yes. It is the reason to put {{hadnleIOErrors()}} in to {{FsVolumeSpi}}. I was thinking to use a per-volume thread to do {{checkDirs()}} and also use {{numOfErrors()}} as trigger. I will do it in a following JIRA as well. Working on the rest of comments. Thanks a lot for these great comments. Expose the stats of IOErrors on each FsVolume through JMX - Key: HDFS-8694 URL: https://issues.apache.org/jira/browse/HDFS-8694 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, HDFS Affects Versions: 2.7.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-8694.000.patch, HDFS-8694.001.patch Currently, once DataNode hits an {{IOError}} when writing / reading block files, it starts a background {{DiskChecker.checkDirs()}} thread. But if this thread successfully finishes, DN does not record this {{IOError}}. We need one measurement to count all {{IOErrors}} for each volume. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8779) WebUI can't display randomly generated block ID
[ https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628439#comment-14628439 ] Hadoop QA commented on HDFS-8779: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 55s | Findbugs (version 3.0.0) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 39s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 2m 1s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 37s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 32s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 4m 28s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 1s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 160m 48s | Tests failed in hadoop-hdfs. | | {color:green}+1{color} | hdfs tests | 0m 28s | Tests passed in hadoop-hdfs-client. | | | | 207m 37s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestDistributedFileSystem | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745448/HDFS-8779.02.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / edcaae4 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11714/artifact/patchprocess/testrun_hadoop-hdfs.txt | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11714/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11714/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11714/console | This message was automatically generated. WebUI can't display randomly generated block ID --- Key: HDFS-8779 URL: https://issues.apache.org/jira/browse/HDFS-8779 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8779.01.patch, HDFS-8779.02.patch Old release use randomly generated block ID(HDFS-4645). max value of Long in Java is 2^63-1 max value of number in Javascript is 2^53-1. ( See [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]) Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER. A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8433) blockToken is not set in constructInternalBlock and parseStripedBlockGroup in StripedBlockUtil
[ https://issues.apache.org/jira/browse/HDFS-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628459#comment-14628459 ] Jing Zhao commented on HDFS-8433: - In 03 patch, when checking the block token, the {{BlockTokenSecrectManager}} still uses a {{BlockTokenIdentifier}} to parse the ID of the token, thus if both sides are new DataNodes, the ID range information cannot be retrieved. blockToken is not set in constructInternalBlock and parseStripedBlockGroup in StripedBlockUtil -- Key: HDFS-8433 URL: https://issues.apache.org/jira/browse/HDFS-8433 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Walter Su Attachments: HDFS-8433-HDFS-7285.02.patch, HDFS-8433.00.patch, HDFS-8433.01.patch, HDFS-8433.03.PoC.patch The blockToken provided in LocatedStripedBlock is not used to create LocatedBlock in constructInternalBlock and parseStripedBlockGroup in StripedBlockUtil. We should also add ec tests with security on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8783) enable socket timeout for balancer's target connection
[ https://issues.apache.org/jira/browse/HDFS-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chang Li updated HDFS-8783: --- Attachment: HDFS-8783.patch enable socket timeout for balancer's target connection -- Key: HDFS-8783 URL: https://issues.apache.org/jira/browse/HDFS-8783 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: HDFS-8783.patch Have met a real case when the balancer connected to a black hole target datanode which accepted connection but not sent any response back, then balancer got hung -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8728) Erasure coding: revisit and simplify BlockInfoStriped and INodeFile
[ https://issues.apache.org/jira/browse/HDFS-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8728: Attachment: HDFS-8728-HDFS-7285.02.patch Uploading branch-based {{HDFS-8728-HDFS-7285.02.patch}} to address some of Andrew's comments: # Since the patch is already large, will leave the ideas of {{getOp}} and saving reference in {{StripedBlockStorageOp}} as follow-ons. # I filed HDFS-8784 to rename {{numNodes}}. This patch basically makes necessary changes to merge trunk's {{BlockInfo}} hierarchy back to HDFS-7285 branch (as well as adding the striped counterparts). If we agree upon this direction I will create another patch to replace all unnecessary usages of {{BIC}}, {{BIS}}, {{BIUCC}}, {{BIUCS}} with {{BlockInfo}} and {{BlockInfoUC}}. After reaching a conclusion here, I plan to update {{Merge-1}} to {{Merge-14}} patches accordingly, and then rebase the HDFS-7285 branch to catch up with trunk. [~jingzhao] Could you share some advice here? Thanks! Erasure coding: revisit and simplify BlockInfoStriped and INodeFile --- Key: HDFS-8728 URL: https://issues.apache.org/jira/browse/HDFS-8728 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8728-HDFS-7285.00.patch, HDFS-8728-HDFS-7285.01.patch, HDFS-8728-HDFS-7285.02.patch, HDFS-8728.00.patch, HDFS-8728.01.patch, HDFS-8728.02.patch, Merge-1-codec.patch, Merge-2-ecZones.patch, Merge-3-blockInfo.patch, Merge-4-blockmanagement.patch, Merge-5-blockPlacementPolicies.patch, Merge-6-locatedStripedBlock.patch, Merge-7-replicationMonitor.patch, Merge-8-inodeFile.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8058) Erasure coding: use BlockInfo[] for both striped and contiguous blocks in INodeFile
[ https://issues.apache.org/jira/browse/HDFS-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8058: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7285 Status: Resolved (was: Patch Available) Since 10 patch only makes a minor change from 09, committing the patch based on Jing's review. I tested {{TestFileLengthOnClusterRestart}} and {{TestFileAppend3}} locally and they passed. Thanks Yi for the initial work, and Jing / Walter for the helpful reviews! Erasure coding: use BlockInfo[] for both striped and contiguous blocks in INodeFile --- Key: HDFS-8058 URL: https://issues.apache.org/jira/browse/HDFS-8058 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Yi Liu Assignee: Zhe Zhang Fix For: HDFS-7285 Attachments: HDFS-8058-HDFS-7285.003.patch, HDFS-8058-HDFS-7285.004.patch, HDFS-8058-HDFS-7285.005.patch, HDFS-8058-HDFS-7285.006.patch, HDFS-8058-HDFS-7285.007.patch, HDFS-8058-HDFS-7285.008.patch, HDFS-8058-HDFS-7285.009.patch, HDFS-8058-HDFS-7285.010.patch, HDFS-8058.001.patch, HDFS-8058.002.patch This JIRA is to use {{BlockInfo[] blocks}} for both striped and contiguous blocks in INodeFile. Currently {{FileWithStripedBlocksFeature}} keeps separate list for striped blocks, and the methods there duplicate with those in INodeFile, and current code need to judge {{isStriped}} then do different things. Also if file is striped, the {{blocks}} in INodeFile occupy a reference memory space. These are not necessary, and we can use the same {{blocks}} to make code more clear. I keep {{FileWithStripedBlocksFeature}} as empty for follow use: I will file a new JIRA to move {{dataBlockNum}} and {{parityBlockNum}} from *BlockInfoStriped* to INodeFile, since ideally they are the same for all striped blocks in a file, and store them in block will waste NN memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8747) Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones
[ https://issues.apache.org/jira/browse/HDFS-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-8747: - Attachment: HDFS-8747-07152015.pdf Provide Better Scratch Space and Soft Delete Support for HDFS Encryption Zones -- Key: HDFS-8747 URL: https://issues.apache.org/jira/browse/HDFS-8747 Project: Hadoop HDFS Issue Type: Bug Components: encryption Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-8747-07092015.pdf, HDFS-8747-07152015.pdf HDFS Transparent Data Encryption At-Rest was introduced in Hadoop 2.6 to allow create encryption zone on top of a single HDFS directory. Files under the root directory of the encryption zone will be encrypted/decrypted transparently upon HDFS client write or read operations. Generally, it does not support rename(without data copying) across encryption zones or between encryption zone and non-encryption zone because different security settings of encryption zones. However, there are certain use cases where efficient rename support is desired. This JIRA is to propose better support of two such use cases “Scratch Space” (a.k.a. staging area) and “Soft Delete” (a.k.a. trash) with HDFS encryption zones. “Scratch Space” is widely used in Hadoop jobs, which requires efficient rename support. Temporary files from MR jobs are usually stored in staging area outside encryption zone such as “/tmp” directory and then rename to targeted directories as specified once the data is ready to be further processed. Below is a summary of supported/unsupported cases from latest Hadoop: * Rename within the encryption zone is supported * Rename the entire encryption zone by moving the root directory of the zone is allowed. * Rename sub-directory/file from encryption zone to non-encryption zone is not allowed. * Rename sub-directory/file from encryption zone A to encryption zone B is not allowed. * Rename from non-encryption zone to encryption zone is not allowed. “Soft delete” (a.k.a. trash) is a client-side “soft delete” feature that helps prevent accidental deletion of files and directories. If trash is enabled and a file or directory is deleted using the Hadoop shell, the file is moved to the .Trash directory of the user's home directory instead of being deleted. Deleted files are initially moved (renamed) to the Current sub-directory of the .Trash directory with original path being preserved. Files and directories in the trash can be restored simply by moving them to a location outside the .Trash directory. Due to the limited rename support, delete sub-directory/file within encryption zone with trash feature is not allowed. Client has to use -skipTrash option to work around this. HADOOP-10902 and HDFS-6767 improved the error message but without a complete solution to the problem. We propose to solve the problem by generalizing the mapping between encryption zone and its underlying HDFS directories from 1:1 today to 1:N. The encryption zone should allow non-overlapped directories such as scratch space or soft delete trash locations to be added/removed dynamically after creation. This way, rename for scratch space and soft delete can be better supported without breaking the assumption that rename is only supported within the zone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8670) Better to exclude decommissioned nodes for namenode NodeUsage JMX
[ https://issues.apache.org/jira/browse/HDFS-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627934#comment-14627934 ] J.Andreina commented on HDFS-8670: -- Thanks [~mingma] . I have updated the patch as per your review comments. bq.Any reason it changes to call fetchDatanodes with parameter removeDecommissionNode set to false? Seems there is an issue in the logic for removing decommission node from live/dead node list .I have raised a separate jira for the same (HDFS-8780). Please review the patch. Better to exclude decommissioned nodes for namenode NodeUsage JMX - Key: HDFS-8670 URL: https://issues.apache.org/jira/browse/HDFS-8670 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: J.Andreina Attachments: HDFS-8670.1.patch, HDFS-8670.2.patch, HDFS-8670.3.patch The namenode NodeUsage JMX has Max, Median, Min and Standard Deviation of DataNodes usage, it currently includes decommissioned nodes for the calculation. However, given balancer doesn't work on decommissioned nodes and sometimes we could have nodes stay in decommissioned states for a long time; it might be better to exclude decommissioned nodes for the metrics calculation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8670) Better to exclude decommissioned nodes for namenode NodeUsage JMX
[ https://issues.apache.org/jira/browse/HDFS-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.Andreina updated HDFS-8670: - Attachment: HDFS-8670.3.patch Better to exclude decommissioned nodes for namenode NodeUsage JMX - Key: HDFS-8670 URL: https://issues.apache.org/jira/browse/HDFS-8670 Project: Hadoop HDFS Issue Type: Bug Reporter: Ming Ma Assignee: J.Andreina Attachments: HDFS-8670.1.patch, HDFS-8670.2.patch, HDFS-8670.3.patch The namenode NodeUsage JMX has Max, Median, Min and Standard Deviation of DataNodes usage, it currently includes decommissioned nodes for the calculation. However, given balancer doesn't work on decommissioned nodes and sometimes we could have nodes stay in decommissioned states for a long time; it might be better to exclude decommissioned nodes for the metrics calculation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8058) Erasure coding: use BlockInfo[] for both striped and contiguous blocks in INodeFile
[ https://issues.apache.org/jira/browse/HDFS-8058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627865#comment-14627865 ] Hadoop QA commented on HDFS-8058: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 8s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 8 new or modified test files. | | {color:green}+1{color} | javac | 7m 59s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 8s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 0m 14s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 40s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 8s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 39s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 36s | The patch appears to introduce 5 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 13s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 107m 19s | Tests failed in hadoop-hdfs. | | | | 151m 45s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Timed out tests | org.apache.hadoop.hdfs.TestFileLengthOnClusterRestart | | | org.apache.hadoop.hdfs.TestFileAppend3 | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745406/HDFS-8058-HDFS-7285.010.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / 0a93712 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/11710/artifact/patchprocess/patchReleaseAuditProblems.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11710/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/11710/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11710/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11710/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11710/console | This message was automatically generated. Erasure coding: use BlockInfo[] for both striped and contiguous blocks in INodeFile --- Key: HDFS-8058 URL: https://issues.apache.org/jira/browse/HDFS-8058 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Yi Liu Assignee: Zhe Zhang Attachments: HDFS-8058-HDFS-7285.003.patch, HDFS-8058-HDFS-7285.004.patch, HDFS-8058-HDFS-7285.005.patch, HDFS-8058-HDFS-7285.006.patch, HDFS-8058-HDFS-7285.007.patch, HDFS-8058-HDFS-7285.008.patch, HDFS-8058-HDFS-7285.009.patch, HDFS-8058-HDFS-7285.010.patch, HDFS-8058.001.patch, HDFS-8058.002.patch This JIRA is to use {{BlockInfo[] blocks}} for both striped and contiguous blocks in INodeFile. Currently {{FileWithStripedBlocksFeature}} keeps separate list for striped blocks, and the methods there duplicate with those in INodeFile, and current code need to judge {{isStriped}} then do different things. Also if file is striped, the {{blocks}} in INodeFile occupy a reference memory space. These are not necessary, and we can use the same {{blocks}} to make code more clear. I keep {{FileWithStripedBlocksFeature}} as empty for follow use: I will file a new JIRA to move {{dataBlockNum}} and {{parityBlockNum}} from *BlockInfoStriped* to INodeFile, since ideally they are the same for all striped blocks in a file, and store them in block will waste NN memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8768) Erasure Coding: block group ID displayed in WebUI is not consistent with fsck
[ https://issues.apache.org/jira/browse/HDFS-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GAO Rui updated HDFS-8768: -- Description: This is duplicated by [HDFS-8779]. For example, In WebUI( usually, namenode port: 50070) , one Erasure Code file with one block group was displayed as the attached screenshot [^Screen Shot 2015-07-14 at 15.33.08.png]. But, with fsck command, the block group of the same file was displayed like: {{0. BP-1130999596-172.23.38.10-1433791629728:blk_-9223372036854740160_3384 len=6438256640}} After checking block file names in datanodes, we believe WebUI may have some problem with Erasure Code block group display. was: For example, In WebUI( usually, namenode port: 50070) , one Erasure Code file with one block group was displayed as the attached screenshot [^Screen Shot 2015-07-14 at 15.33.08.png]. But, with fsck command, the block group of the same file was displayed like: {{0. BP-1130999596-172.23.38.10-1433791629728:blk_-9223372036854740160_3384 len=6438256640}} After checking block file names in datanodes, we believe WebUI may have some problem with Erasure Code block group display. Erasure Coding: block group ID displayed in WebUI is not consistent with fsck - Key: HDFS-8768 URL: https://issues.apache.org/jira/browse/HDFS-8768 Project: Hadoop HDFS Issue Type: Sub-task Reporter: GAO Rui Attachments: Screen Shot 2015-07-14 at 15.33.08.png, screen-shot-with-HDFS-8779-patch.PNG This is duplicated by [HDFS-8779]. For example, In WebUI( usually, namenode port: 50070) , one Erasure Code file with one block group was displayed as the attached screenshot [^Screen Shot 2015-07-14 at 15.33.08.png]. But, with fsck command, the block group of the same file was displayed like: {{0. BP-1130999596-172.23.38.10-1433791629728:blk_-9223372036854740160_3384 len=6438256640}} After checking block file names in datanodes, we believe WebUI may have some problem with Erasure Code block group display. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8768) Erasure Coding: block group ID displayed in WebUI is not consistent with fsck
[ https://issues.apache.org/jira/browse/HDFS-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627889#comment-14627889 ] GAO Rui commented on HDFS-8768: --- Thank you [~walter.k.su] very much! Erasure Coding: block group ID displayed in WebUI is not consistent with fsck - Key: HDFS-8768 URL: https://issues.apache.org/jira/browse/HDFS-8768 Project: Hadoop HDFS Issue Type: Sub-task Reporter: GAO Rui Attachments: Screen Shot 2015-07-14 at 15.33.08.png, screen-shot-with-HDFS-8779-patch.PNG For example, In WebUI( usually, namenode port: 50070) , one Erasure Code file with one block group was displayed as the attached screenshot [^Screen Shot 2015-07-14 at 15.33.08.png]. But, with fsck command, the block group of the same file was displayed like: {{0. BP-1130999596-172.23.38.10-1433791629728:blk_-9223372036854740160_3384 len=6438256640}} After checking block file names in datanodes, we believe WebUI may have some problem with Erasure Code block group display. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8780) Fetching live/dead datanode list with arg true for removeDecommissionNode,returns list with decom node.
J.Andreina created HDFS-8780: Summary: Fetching live/dead datanode list with arg true for removeDecommissionNode,returns list with decom node. Key: HDFS-8780 URL: https://issues.apache.org/jira/browse/HDFS-8780 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Critical Current implementation: == DatanodeManager#removeDecomNodeFromList() , Decommissioned node will be removed from dead/live node list only if below conditions are met I . If the Include list is not empty. II. If include and exclude list does not have decommissioned node and node state is decommissioned. {code} if (!hostFileManager.hasIncludes()) { return; } if ((!hostFileManager.isIncluded(node)) (!hostFileManager.isExcluded(node)) node.isDecommissioned()) { // Include list is not empty, an existing datanode does not appear // in both include or exclude lists and it has been decommissioned. // Remove it from the node list. it.remove(); } {code} As mentioned in javadoc a datanode cannot be in already decommissioned datanode state. Following the steps mentioned in javadoc datanode state is dead and not decommissioned. *Can we avoid the unnecessary checks and have check for the node is in decommissioned state then remove from node list. ?* Please provide your feedback. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8767) RawLocalFileSystem.listStatus() returns null for UNIX pipefile
[ https://issues.apache.org/jira/browse/HDFS-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanaka kumar avvaru updated HDFS-8767: -- Attachment: HDFS-8767-02.patch RawLocalFileSystem.listStatus() returns null for UNIX pipefile -- Key: HDFS-8767 URL: https://issues.apache.org/jira/browse/HDFS-8767 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: kanaka kumar avvaru Priority: Critical Attachments: HDFS-8767-00.patch, HDFS-8767-01.patch, HDFS-8767-02.patch Calling FileSystem.listStatus() on a UNIX pipe file returns null instead of the file. The bug breaks Hive when Hive loads data from UNIX pipe file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8771) If IPCLoggerChannel#purgeLogsOlderThan takes too long, Namenode could not send another RPC calls to Journalnodes
[ https://issues.apache.org/jira/browse/HDFS-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] kanaka kumar avvaru reassigned HDFS-8771: - Assignee: kanaka kumar avvaru If IPCLoggerChannel#purgeLogsOlderThan takes too long, Namenode could not send another RPC calls to Journalnodes Key: HDFS-8771 URL: https://issues.apache.org/jira/browse/HDFS-8771 Project: Hadoop HDFS Issue Type: Bug Reporter: Takuya Fukudome Assignee: kanaka kumar avvaru In our cluster, edits has became huge(about 50GB) accidentally and our Jounalnodes' disks were busy, therefore {{purgeLogsOlderThan}} took more than 30secs. If {{IPCLoggerChannel#purgeLogsOlderThan}} takes too much time, Namenode couldn't send other RPC calls to Journalnodes because {{o.a.h.hdfs.qjournal.client.IPCLoggerChannel}}'s executor is single thread. It will cause namenode shutting down. I think IPCLoggerChannel#purgeLogsOlderThan should not block other RPC calls like sendEdits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout
[ https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627887#comment-14627887 ] Hudson commented on HDFS-7608: -- FAILURE: Integrated in Hadoop-Yarn-trunk #987 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/987/]) HDFS-7608: hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java HDFS-7608: add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hdfs dfsclient newConnectedPeer has no write timeout - Key: HDFS-7608 URL: https://issues.apache.org/jira/browse/HDFS-7608 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs, hdfs-client Affects Versions: 2.3.0, 2.6.0 Environment: hdfs 2.3.0 hbase 0.98.6 Reporter: zhangshilong Assignee: Xiaoyu Yao Fix For: 2.8.0 Attachments: HDFS-7608.0.patch, HDFS-7608.1.patch, HDFS-7608.2.patch Original Estimate: 24h Remaining Estimate: 24h problem: hbase compactSplitThread may lock forever on read datanode blocks. debug found: epollwait timeout set to 0,so epollwait can not run out. cause: in hdfs 2.3.0 hbase using DFSClient to read and write blocks. DFSClient creates one socket using newConnectedPeer(addr), but has no read or write timeout. in v 2.6.0, newConnectedPeer has added readTimeout to deal with the problem,but did not add writeTimeout. why did not add write Timeout? I think NioInetPeer need a default socket timeout,so appalications will no need to force adding timeout by themselives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8742) Inotify: Support event for OP_TRUNCATE
[ https://issues.apache.org/jira/browse/HDFS-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627884#comment-14627884 ] Hudson commented on HDFS-8742: -- FAILURE: Integrated in Hadoop-Yarn-trunk #987 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/987/]) HDFS-8742. Inotify: Support event for OP_TRUNCATE. Contributed by Surendra Singh Lilhore. (aajisaka: rev 979c9ca2ca89e99dc7165abfa29c78d66de43d9a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/inotify/Event.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/InotifyFSEditLogOpTranslator.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/inotify.proto * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java Inotify: Support event for OP_TRUNCATE -- Key: HDFS-8742 URL: https://issues.apache.org/jira/browse/HDFS-8742 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Surendra Singh Lilhore Assignee: Surendra Singh Lilhore Fix For: 2.8.0 Attachments: HDFS-8742-001.patch, HDFS-8742.patch Currently inotify is not giving any event for Truncate operation. NN should send event for Truncate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8722) Optimize datanode writes for small writes and flushes
[ https://issues.apache.org/jira/browse/HDFS-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627885#comment-14627885 ] Hudson commented on HDFS-8722: -- FAILURE: Integrated in Hadoop-Yarn-trunk #987 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/987/]) HDFS-8722. Optimize datanode writes for small writes and flushes. Contributed by Kihwal Lee (kihwal: rev 59388a801514d6af64ef27fbf246d8054f1dcc74) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java Optimize datanode writes for small writes and flushes - Key: HDFS-8722 URL: https://issues.apache.org/jira/browse/HDFS-8722 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.1 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 2.7.2 Attachments: HDFS-8722.patch, HDFS-8722.v1.patch After the data corruption fix by HDFS-4660, the CRC recalculation for partial chunk is executed more frequently, if the client repeats writing few bytes and calling hflush/hsync. This is because the generic logic forces CRC recalculation if on-disk data is not CRC chunk aligned. Prior to HDFS-4660, datanode blindly accepted whatever CRC client provided, if the incoming data is chunk-aligned. This was the source of the corruption. We can still optimize for the most common case where a client is repeatedly writing small number of bytes followed by hflush/hsync with no pipeline recovery or append, by allowing the previous behavior for this specific case. If the incoming data has a duplicate portion and that is at the last chunk-boundary before the partial chunk on disk, datanode can use the checksum supplied by the client without redoing the checksum on its own. This reduces disk reads as well as CPU load for the checksum calculation. If the incoming packet data goes back further than the last on-disk chunk boundary, datanode will still do a recalculation, but this occurs rarely during pipeline recoveries. Thus the optimization for this specific case should be sufficient to speed up the vast majority of cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8771) If IPCLoggerChannel#purgeLogsOlderThan takes too long, Namenode could not send another RPC calls to Journalnodes
[ https://issues.apache.org/jira/browse/HDFS-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627895#comment-14627895 ] kanaka kumar avvaru commented on HDFS-8771: --- In my view, all the write related calls are handled in single thread to ensure order of request from NN. So, journal node can perform purge operation in a separate thread instead of blocking the caller. Please correct me if any other suggested approach is better. If IPCLoggerChannel#purgeLogsOlderThan takes too long, Namenode could not send another RPC calls to Journalnodes Key: HDFS-8771 URL: https://issues.apache.org/jira/browse/HDFS-8771 Project: Hadoop HDFS Issue Type: Bug Reporter: Takuya Fukudome Assignee: kanaka kumar avvaru In our cluster, edits has became huge(about 50GB) accidentally and our Jounalnodes' disks were busy, therefore {{purgeLogsOlderThan}} took more than 30secs. If {{IPCLoggerChannel#purgeLogsOlderThan}} takes too much time, Namenode couldn't send other RPC calls to Journalnodes because {{o.a.h.hdfs.qjournal.client.IPCLoggerChannel}}'s executor is single thread. It will cause namenode shutting down. I think IPCLoggerChannel#purgeLogsOlderThan should not block other RPC calls like sendEdits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8722) Optimize datanode writes for small writes and flushes
[ https://issues.apache.org/jira/browse/HDFS-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627877#comment-14627877 ] Hudson commented on HDFS-8722: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #257 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/257/]) HDFS-8722. Optimize datanode writes for small writes and flushes. Contributed by Kihwal Lee (kihwal: rev 59388a801514d6af64ef27fbf246d8054f1dcc74) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Optimize datanode writes for small writes and flushes - Key: HDFS-8722 URL: https://issues.apache.org/jira/browse/HDFS-8722 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.1 Reporter: Kihwal Lee Assignee: Kihwal Lee Priority: Critical Fix For: 2.7.2 Attachments: HDFS-8722.patch, HDFS-8722.v1.patch After the data corruption fix by HDFS-4660, the CRC recalculation for partial chunk is executed more frequently, if the client repeats writing few bytes and calling hflush/hsync. This is because the generic logic forces CRC recalculation if on-disk data is not CRC chunk aligned. Prior to HDFS-4660, datanode blindly accepted whatever CRC client provided, if the incoming data is chunk-aligned. This was the source of the corruption. We can still optimize for the most common case where a client is repeatedly writing small number of bytes followed by hflush/hsync with no pipeline recovery or append, by allowing the previous behavior for this specific case. If the incoming data has a duplicate portion and that is at the last chunk-boundary before the partial chunk on disk, datanode can use the checksum supplied by the client without redoing the checksum on its own. This reduces disk reads as well as CPU load for the checksum calculation. If the incoming packet data goes back further than the last on-disk chunk boundary, datanode will still do a recalculation, but this occurs rarely during pipeline recoveries. Thus the optimization for this specific case should be sufficient to speed up the vast majority of cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout
[ https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627879#comment-14627879 ] Hudson commented on HDFS-7608: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #257 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/257/]) HDFS-7608: hdfs dfsclient newConnectedPeer has no write timeout (Xiaoyu Yao via Colin P. McCabe) (cmccabe: rev 1d74ccececaefffaa90c0c18b40a3645dbc819d9) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java HDFS-7608: add CHANGES.txt (cmccabe: rev b7fb6ec4513de7d342c541eb3d9e14642286e2cf) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt hdfs dfsclient newConnectedPeer has no write timeout - Key: HDFS-7608 URL: https://issues.apache.org/jira/browse/HDFS-7608 Project: Hadoop HDFS Issue Type: Bug Components: fuse-dfs, hdfs-client Affects Versions: 2.3.0, 2.6.0 Environment: hdfs 2.3.0 hbase 0.98.6 Reporter: zhangshilong Assignee: Xiaoyu Yao Fix For: 2.8.0 Attachments: HDFS-7608.0.patch, HDFS-7608.1.patch, HDFS-7608.2.patch Original Estimate: 24h Remaining Estimate: 24h problem: hbase compactSplitThread may lock forever on read datanode blocks. debug found: epollwait timeout set to 0,so epollwait can not run out. cause: in hdfs 2.3.0 hbase using DFSClient to read and write blocks. DFSClient creates one socket using newConnectedPeer(addr), but has no read or write timeout. in v 2.6.0, newConnectedPeer has added readTimeout to deal with the problem,but did not add writeTimeout. why did not add write Timeout? I think NioInetPeer need a default socket timeout,so appalications will no need to force adding timeout by themselives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8742) Inotify: Support event for OP_TRUNCATE
[ https://issues.apache.org/jira/browse/HDFS-8742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627876#comment-14627876 ] Hudson commented on HDFS-8742: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #257 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/257/]) HDFS-8742. Inotify: Support event for OP_TRUNCATE. Contributed by Surendra Singh Lilhore. (aajisaka: rev 979c9ca2ca89e99dc7165abfa29c78d66de43d9a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/inotify.proto * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/inotify/Event.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/InotifyFSEditLogOpTranslator.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSInotifyEventInputStream.java Inotify: Support event for OP_TRUNCATE -- Key: HDFS-8742 URL: https://issues.apache.org/jira/browse/HDFS-8742 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.7.0 Reporter: Surendra Singh Lilhore Assignee: Surendra Singh Lilhore Fix For: 2.8.0 Attachments: HDFS-8742-001.patch, HDFS-8742.patch Currently inotify is not giving any event for Truncate operation. NN should send event for Truncate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6290) File is not closed in OfflineImageViewerPB#run()
[ https://issues.apache.org/jira/browse/HDFS-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628554#comment-14628554 ] Akira AJISAKA commented on HDFS-6290: - Hi [~hapandya], what's going on this issue? I'd like to take it over. File is not closed in OfflineImageViewerPB#run() Key: HDFS-6290 URL: https://issues.apache.org/jira/browse/HDFS-6290 Project: Hadoop HDFS Issue Type: Bug Components: tools Reporter: Ted Yu Priority: Minor {code} } else if (processor.equals(XML)) { new PBImageXmlWriter(conf, out).visit(new RandomAccessFile(inputFile, r)); {code} The RandomAccessFile instance should be closed before the method returns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8778) TestBlockReportRateLimiting#testLeaseExpiration can deadlock
[ https://issues.apache.org/jira/browse/HDFS-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628590#comment-14628590 ] Hadoop QA commented on HDFS-8778: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 8m 11s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 43s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 21s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 17s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 1m 5s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 159m 25s | Tests failed in hadoop-hdfs. | | | | 182m 22s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestDistributedFileSystem | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745471/HDFS-8778.02.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / edcaae4 | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11716/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11716/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11716/console | This message was automatically generated. TestBlockReportRateLimiting#testLeaseExpiration can deadlock Key: HDFS-8778 URL: https://issues.apache.org/jira/browse/HDFS-8778 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.1 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-8778.01.patch, HDFS-8778.02.patch {{requestBlockReportLease}} blocks on DataNode registration while holding the NameSystem read lock. DataNode registration can block on the NameSystem read lock if a writer gets in the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8785) TestDistributedFileSystem is failing in trunk
Arpit Agarwal created HDFS-8785: --- Summary: TestDistributedFileSystem is failing in trunk Key: HDFS-8785 URL: https://issues.apache.org/jira/browse/HDFS-8785 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 2.8.0 Reporter: Arpit Agarwal A newly added test case {{TestDistributedFileSystem#testDFSClientPeerWriteTimeout}} is failing in trunk. e.g. run https://builds.apache.org/job/PreCommit-HDFS-Build/11716/testReport/org.apache.hadoop.hdfs/TestDistributedFileSystem/testDFSClientPeerWriteTimeout/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8716) introduce a new config specifically for safe mode block count
[ https://issues.apache.org/jira/browse/HDFS-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628601#comment-14628601 ] Chang Li commented on HDFS-8716: those tests failures are not related to my change. I have applied the latest patch to trunk and run all the unit tests and pass. [~kihwal] could you please help review the latest patch. Thanks! introduce a new config specifically for safe mode block count - Key: HDFS-8716 URL: https://issues.apache.org/jira/browse/HDFS-8716 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Attachments: HDFS-8716.1.patch, HDFS-8716.2.patch, HDFS-8716.3.patch, HDFS-8716.4.patch, HDFS-8716.5.patch, HDFS-8716.6.patch, HDFS-8716.7.patch, HDFS-8716.7.patch During the start up, namenode waits for n replicas of each block to be reported by datanodes before exiting the safe mode. Currently n is tied to the min replicas config. We could set min replicas to more than one but we might want to exit safe mode as soon as each block has one replica reported. This can be worked out by introducing a new config variable for safe mode block count -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8433) blockToken is not set in constructInternalBlock and parseStripedBlockGroup in StripedBlockUtil
[ https://issues.apache.org/jira/browse/HDFS-8433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628883#comment-14628883 ] Zhe Zhang commented on HDFS-8433: - Thanks for clarifying, I missed the {{BlockManager}} change. blockToken is not set in constructInternalBlock and parseStripedBlockGroup in StripedBlockUtil -- Key: HDFS-8433 URL: https://issues.apache.org/jira/browse/HDFS-8433 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Walter Su Attachments: HDFS-8433-HDFS-7285.02.patch, HDFS-8433.00.patch, HDFS-8433.01.patch, HDFS-8433.03.PoC.patch The blockToken provided in LocatedStripedBlock is not used to create LocatedBlock in constructInternalBlock and parseStripedBlockGroup in StripedBlockUtil. We should also add ec tests with security on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8767) RawLocalFileSystem.listStatus() returns null for UNIX pipefile
[ https://issues.apache.org/jira/browse/HDFS-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628903#comment-14628903 ] Hadoop QA commented on HDFS-8767: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 19m 3s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 8m 43s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 55s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 18s | The applied patch generated 1 new checkstyle issues (total was 21, now 21). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 25s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 36s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 7s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | common tests | 24m 1s | Tests passed in hadoop-common. | | | | 68m 35s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12745529/HDFS-8767.003.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3ec0a04 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11721/artifact/patchprocess/diffcheckstylehadoop-common.txt | | hadoop-common test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11721/artifact/patchprocess/testrun_hadoop-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11721/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11721/console | This message was automatically generated. RawLocalFileSystem.listStatus() returns null for UNIX pipefile -- Key: HDFS-8767 URL: https://issues.apache.org/jira/browse/HDFS-8767 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: kanaka kumar avvaru Priority: Critical Attachments: HDFS-8767-00.patch, HDFS-8767-01.patch, HDFS-8767-02.patch, HDFS-8767.003.patch Calling FileSystem.listStatus() on a UNIX pipe file returns null instead of the file. The bug breaks Hive when Hive loads data from UNIX pipe file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8787) Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be consistent with trunk
[ https://issues.apache.org/jira/browse/HDFS-8787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628917#comment-14628917 ] Jing Zhao commented on HDFS-8787: - +1 pending Jenkins. Erasure coding: rename BlockInfoContiguousUC and BlockInfoStripedUC to be consistent with trunk --- Key: HDFS-8787 URL: https://issues.apache.org/jira/browse/HDFS-8787 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8787-HDFS-7285.00.patch As Nicholas suggested under HDFS-8728, we should split the patch on {{BlockInfo}} structure into smaller pieces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7483) Display information per tier on the Namenode UI
[ https://issues.apache.org/jira/browse/HDFS-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628406#comment-14628406 ] Benoy Antony commented on HDFS-7483: To add on, I had tried the approach while working on the patch. _math_ is a helper whereas fmt_percentage is a filter. We cannot do something like helper | filter . Some helpers support a filters attribute. But math helper does not support filters attribute. So I could not reuse math helper and fmt_percentage filter. That's why I wrote a a new percentage helper. Display information per tier on the Namenode UI --- Key: HDFS-7483 URL: https://issues.apache.org/jira/browse/HDFS-7483 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-7483-001.patch, HDFS-7483-002.patch, overview.png, storagetypes.png, storagetypes_withnostorage.png, withOneStorageType.png, withTwoStorageType.png If cluster has different types of storage, it is useful to display the storage information per type. The information will be available via JMX (HDFS-7390) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8783) enable socket timeout for balancer's target connection
Chang Li created HDFS-8783: -- Summary: enable socket timeout for balancer's target connection Key: HDFS-8783 URL: https://issues.apache.org/jira/browse/HDFS-8783 Project: Hadoop HDFS Issue Type: Bug Reporter: Chang Li Assignee: Chang Li Have met a real case when the balancer connected to a black hole target datanode which accepted connection but not sent any response back, then balancer got hung -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8697) Refactor DecommissionManager: more generic method names and misc cleanup
[ https://issues.apache.org/jira/browse/HDFS-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8697: Attachment: HDFS-8697.01.patch Thanks Andrew for pointing out the issue. Uploading new patch to revise {{replicated}} and {{stored}} related naming (generalizing them with {{redundancy}}). I filed HDFS-8786 as a follow-on to avoid reconstruction after decomm. The change could be large because it breaks the implicit assumption that each internal block should have only 1 replica in common cases. Refactor DecommissionManager: more generic method names and misc cleanup Key: HDFS-8697 URL: https://issues.apache.org/jira/browse/HDFS-8697 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 2.7.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8697.00.patch, HDFS-8697.01.patch This JIRA merges the changes in {{DecommissionManager}} from the HDFS-7285 branch, including changing a few method names to be more generic ({{replicated}} - {{stored}}), and some cleanups. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8785) TestDistributedFileSystem is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628707#comment-14628707 ] Andrew Wang commented on HDFS-8785: --- Possibly related to HDFS-7608? [~cmccabe] thoughts? TestDistributedFileSystem is failing in trunk - Key: HDFS-8785 URL: https://issues.apache.org/jira/browse/HDFS-8785 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 2.8.0 Reporter: Arpit Agarwal Assignee: Xiaoyu Yao A newly added test case {{TestDistributedFileSystem#testDFSClientPeerWriteTimeout}} is failing in trunk. e.g. run https://builds.apache.org/job/PreCommit-HDFS-Build/11716/testReport/org.apache.hadoop.hdfs/TestDistributedFileSystem/testDFSClientPeerWriteTimeout/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8778) TestBlockReportRateLimiting#testLeaseExpiration can deadlock
[ https://issues.apache.org/jira/browse/HDFS-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-8778: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Target Version/s: (was: 2.7.2) Status: Resolved (was: Patch Available) Thanks for the review Andrew. Committed for 2.8.0. TestBlockReportRateLimiting#testLeaseExpiration can deadlock Key: HDFS-8778 URL: https://issues.apache.org/jira/browse/HDFS-8778 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.1 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.8.0 Attachments: HDFS-8778.01.patch, HDFS-8778.02.patch {{requestBlockReportLease}} blocks on DataNode registration while holding the NameSystem read lock. DataNode registration can block on the NameSystem read lock if a writer gets in the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8767) RawLocalFileSystem.listStatus() returns null for UNIX pipefile
[ https://issues.apache.org/jira/browse/HDFS-8767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628756#comment-14628756 ] Haohui Mai commented on HDFS-8767: -- Thanks for the work. The fix looks good. bq. --The format looks fine in eclipse. Fixing this will reduce the readability Readability is subjective. It might make more sense to fix it to avoid the checkstyle warnings. {code} + @Test + public void testFileStatusPipeFile() throws Exception { +Assume.assumeTrue(SystemUtils.IS_OS_UNIX); +String path = TEST_ROOT_DIR + /testfifofile; +new File(path).delete(); +File fifoFile = new File(path); +fifoFile.getParentFile().mkdirs(); +String fullPath = fifoFile.getAbsolutePath(); +Process process = Runtime.getRuntime().exec(mkfifo + fullPath); +process.waitFor(); + +String input = org.apache.commons.io.IOUtils.toString(process +.getInputStream()); +String errors = org.apache.commons.io.IOUtils.toString(process +.getErrorStream()); +assertTrue(Expected empty but got + input, .equals(input)); +assertTrue(Expected empty but got + errors, .equals(errors)); + +fifoFile = new File(fullPath); +assertTrue(FIFO file should present, fifoFile.exists()); +assertFalse(fifoFile.isFile()); +assertFalse(fifoFile.isDirectory()); + +Path fsPath = new Path(path); +FileSystem fs = fileSys.getRawFileSystem(); +assertTrue(fs.exists(fsPath)); +assertNotNull(fs.listStatus(fsPath)); +fifoFile.delete(); + } } {code} To me it seems that it makes more sense to test it through mockito instead of creating a real pipe file. I'll upload a patch later to demonstrate the proposed approach. RawLocalFileSystem.listStatus() returns null for UNIX pipefile -- Key: HDFS-8767 URL: https://issues.apache.org/jira/browse/HDFS-8767 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: kanaka kumar avvaru Priority: Critical Attachments: HDFS-8767-00.patch, HDFS-8767-01.patch, HDFS-8767-02.patch Calling FileSystem.listStatus() on a UNIX pipe file returns null instead of the file. The bug breaks Hive when Hive loads data from UNIX pipe file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8785) TestDistributedFileSystem is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao reassigned HDFS-8785: Assignee: Xiaoyu Yao TestDistributedFileSystem is failing in trunk - Key: HDFS-8785 URL: https://issues.apache.org/jira/browse/HDFS-8785 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 2.8.0 Reporter: Arpit Agarwal Assignee: Xiaoyu Yao A newly added test case {{TestDistributedFileSystem#testDFSClientPeerWriteTimeout}} is failing in trunk. e.g. run https://builds.apache.org/job/PreCommit-HDFS-Build/11716/testReport/org.apache.hadoop.hdfs/TestDistributedFileSystem/testDFSClientPeerWriteTimeout/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8785) TestDistributedFileSystem is failing in trunk
[ https://issues.apache.org/jira/browse/HDFS-8785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628654#comment-14628654 ] Xiaoyu Yao commented on HDFS-8785: -- Thanks [~arpitagarwal] for reporting this, I will take a look at it. TestDistributedFileSystem is failing in trunk - Key: HDFS-8785 URL: https://issues.apache.org/jira/browse/HDFS-8785 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 2.8.0 Reporter: Arpit Agarwal Assignee: Xiaoyu Yao A newly added test case {{TestDistributedFileSystem#testDFSClientPeerWriteTimeout}} is failing in trunk. e.g. run https://builds.apache.org/job/PreCommit-HDFS-Build/11716/testReport/org.apache.hadoop.hdfs/TestDistributedFileSystem/testDFSClientPeerWriteTimeout/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8778) TestBlockReportRateLimiting#testLeaseExpiration can deadlock
[ https://issues.apache.org/jira/browse/HDFS-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628703#comment-14628703 ] Andrew Wang commented on HDFS-8778: --- LGTM +1, thanks Arpit for finding and fixing. Test failure looks unrelated. TestBlockReportRateLimiting#testLeaseExpiration can deadlock Key: HDFS-8778 URL: https://issues.apache.org/jira/browse/HDFS-8778 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.1 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-8778.01.patch, HDFS-8778.02.patch {{requestBlockReportLease}} blocks on DataNode registration while holding the NameSystem read lock. DataNode registration can block on the NameSystem read lock if a writer gets in the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8666) speedup TestMover
[ https://issues.apache.org/jira/browse/HDFS-8666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628705#comment-14628705 ] Tsz Wo Nicholas Sze commented on HDFS-8666: --- With the patch, the time reduces from 5m24s to 49s per local test. The result is great. Thanks! speedup TestMover - Key: HDFS-8666 URL: https://issues.apache.org/jira/browse/HDFS-8666 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Walter Su Assignee: Walter Su Fix For: 2.8.0 Attachments: HDFS-8666.01.patch TestMover is one of the most time consuming tests.(See [TestReport#1|https://builds.apache.org/job/PreCommit-HDFS-Build/11450/testReport/] ) It often timeout. (See [TestReport#2|https://issues.apache.org/jira/browse/HDFS-8652?focusedCommentId=14598394page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14598394] ) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8779) WebUI can't display randomly generated block ID
[ https://issues.apache.org/jira/browse/HDFS-8779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628709#comment-14628709 ] Andrew Wang commented on HDFS-8779: --- Hi [~walter.k.su], is it possible to add a test for this? Otherwise looks good :) WebUI can't display randomly generated block ID --- Key: HDFS-8779 URL: https://issues.apache.org/jira/browse/HDFS-8779 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8779.01.patch, HDFS-8779.02.patch Old release use randomly generated block ID(HDFS-4645). max value of Long in Java is 2^63-1 max value of number in Javascript is 2^53-1. ( See [Link|https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/MAX_SAFE_INTEGER]) Which means almost every randomly generated block ID exceeds MAX_SAFE_INTEGER. A integer which exceeds MAX_SAFE_INTEGER cannot be represented in Javascript. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8778) TestBlockReportRateLimiting#testLeaseExpiration can deadlock
[ https://issues.apache.org/jira/browse/HDFS-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628747#comment-14628747 ] Hudson commented on HDFS-8778: -- FAILURE: Integrated in Hadoop-trunk-Commit #8169 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8169/]) HDFS-8778. TestBlockReportRateLimiting#testLeaseExpiration can deadlock. (Contributed by Arpit Agarwal) (arp: rev 3ec0a0444f75c8743289ec7c8645d4bdf51fc45a) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockReportRateLimiting.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestBlockReportRateLimiting#testLeaseExpiration can deadlock Key: HDFS-8778 URL: https://issues.apache.org/jira/browse/HDFS-8778 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.1 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.8.0 Attachments: HDFS-8778.01.patch, HDFS-8778.02.patch {{requestBlockReportLease}} blocks on DataNode registration while holding the NameSystem read lock. DataNode registration can block on the NameSystem read lock if a writer gets in the queue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8078) HDFS client gets errors trying to to connect to IPv6 DataNode
[ https://issues.apache.org/jira/browse/HDFS-8078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628763#comment-14628763 ] Elliott Clark commented on HDFS-8078: - PING? HDFS client gets errors trying to to connect to IPv6 DataNode - Key: HDFS-8078 URL: https://issues.apache.org/jira/browse/HDFS-8078 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Nate Edel Assignee: Nate Edel Labels: BB2015-05-TBR, ipv6 Attachments: HDFS-8078.10.patch, HDFS-8078.9.patch 1st exception, on put: 15/03/23 18:43:18 WARN hdfs.DFSClient: DataStreamer Exception java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 2401:db00:1010:70ba:face:0:8:0:50010 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:212) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:153) at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1607) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1408) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1361) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:588) Appears to actually stem from code in DataNodeID which assumes it's safe to append together (ipaddr + : + port) -- which is OK for IPv4 and not OK for IPv6. NetUtils.createSocketAddr( ) assembles a Java URI object, which requires the format proto://[2401:db00:1010:70ba:face:0:8:0]:50010 Currently using InetAddress.getByName() to validate IPv6 (guava InetAddresses.forString has been flaky) but could also use our own parsing. (From logging this, it seems like a low-enough frequency call that the extra object creation shouldn't be problematic, and for me the slight risk of passing in bad input that is not actually an IPv4 or IPv6 address and thus calling an external DNS lookup is outweighed by getting the address normalized and avoiding rewriting parsing.) Alternatively, sun.net.util.IPAddressUtil.isIPv6LiteralAddress() --- 2nd exception (on datanode) 15/04/13 13:18:07 ERROR datanode.DataNode: dev1903.prn1.facebook.com:50010:DataXceiver error processing unknown operation src: /2401:db00:20:7013:face:0:7:0:54152 dst: /2401:db00:11:d010:face:0:2f:0:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226) at java.lang.Thread.run(Thread.java:745) Which also comes as client error -get: 2401 is not an IP string literal. This one has existing parsing logic which needs to shift to the last colon rather than the first. Should also be a tiny bit faster by using lastIndexOf rather than split. Could alternatively use the techniques above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8786) Erasure coding: DataNode should transfer striped blocks before being decommissioned
Zhe Zhang created HDFS-8786: --- Summary: Erasure coding: DataNode should transfer striped blocks before being decommissioned Key: HDFS-8786 URL: https://issues.apache.org/jira/browse/HDFS-8786 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Per [discussion | https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004] under HDFS-8697, it's too expensive to reconstruct block groups for decomm purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8738) Limit Exceptions thrown by DataNode when a client makes socket connection and sends an empty message
[ https://issues.apache.org/jira/browse/HDFS-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Kartha reassigned HDFS-8738: --- Assignee: Rajesh Kartha Limit Exceptions thrown by DataNode when a client makes socket connection and sends an empty message Key: HDFS-8738 URL: https://issues.apache.org/jira/browse/HDFS-8738 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: Rajesh Kartha Assignee: Rajesh Kartha Priority: Minor When a client creates a socket connection to the Datanode and sends an empty message, the datanode logs have exceptions like these: 2015-07-08 20:00:55,427 ERROR datanode.DataNode (DataXceiver.java:run(278)) - bidev17.rtp.ibm.com:50010:DataXceiver error processing unknown operation src: /127.0.0.1:41508 dst: /127.0.0.1:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227) at java.lang.Thread.run(Thread.java:745) 2015-07-08 20:00:56,671 ERROR datanode.DataNode (DataXceiver.java:run(278)) - bidev17.rtp.ibm.com:50010:DataXceiver error processing unknown operation src: /127.0.0.1:41509 dst: /127.0.0.1:50010 java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:315) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:227) at java.lang.Thread.run(Thread.java:745) These can fill up the logs and was recently noticed with an Ambari 2.1 based install which tries to check if the datanode is up. Can be easily reproduced with a simple Java client creating a Socket connection: public static void main(String[] args) { Socket DNClient; try { DNClient = new Socket(127.0.0.1, 50010); DataOutputStream os= new DataOutputStream(DNClient.getOutputStream()); os.writeBytes(); os.close(); } catch (UnknownHostException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8728) Erasure coding: revisit and simplify BlockInfoStriped and INodeFile
[ https://issues.apache.org/jira/browse/HDFS-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-8728: Status: Open (was: Patch Available) Erasure coding: revisit and simplify BlockInfoStriped and INodeFile --- Key: HDFS-8728 URL: https://issues.apache.org/jira/browse/HDFS-8728 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8728-HDFS-7285.00.patch, HDFS-8728-HDFS-7285.01.patch, HDFS-8728-HDFS-7285.02.patch, HDFS-8728.00.patch, HDFS-8728.01.patch, HDFS-8728.02.patch, Merge-1-codec.patch, Merge-2-ecZones.patch, Merge-3-blockInfo.patch, Merge-4-blockmanagement.patch, Merge-5-blockPlacementPolicies.patch, Merge-6-locatedStripedBlock.patch, Merge-7-replicationMonitor.patch, Merge-8-inodeFile.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8728) Erasure coding: revisit and simplify BlockInfoStriped and INodeFile
[ https://issues.apache.org/jira/browse/HDFS-8728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628965#comment-14628965 ] Zhe Zhang commented on HDFS-8728: - Yes it does On Wed, Jul 15, 2015 at 4:53 PM Tsz Wo Nicholas Sze (JIRA) j...@apache.org Erasure coding: revisit and simplify BlockInfoStriped and INodeFile --- Key: HDFS-8728 URL: https://issues.apache.org/jira/browse/HDFS-8728 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8728-HDFS-7285.00.patch, HDFS-8728-HDFS-7285.01.patch, HDFS-8728-HDFS-7285.02.patch, HDFS-8728-HDFS-7285.03.patch, HDFS-8728.00.patch, HDFS-8728.01.patch, HDFS-8728.02.patch, Merge-1-codec.patch, Merge-2-ecZones.patch, Merge-3-blockInfo.patch, Merge-4-blockmanagement.patch, Merge-5-blockPlacementPolicies.patch, Merge-6-locatedStripedBlock.patch, Merge-7-replicationMonitor.patch, Merge-8-inodeFile.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8788) Implement unit tests for remote block reader in libhdfspp
[ https://issues.apache.org/jira/browse/HDFS-8788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8788: - Attachment: HDFS-8788.000.patch Implement unit tests for remote block reader in libhdfspp - Key: HDFS-8788 URL: https://issues.apache.org/jira/browse/HDFS-8788 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8788.000.patch This jira proposes to implement unit tests for the remote block reader in gmock. -- This message was sent by Atlassian JIRA (v6.3.4#6332)