[jira] [Commented] (HDFS-8332) DistributedFileSystem listCacheDirectives() and listCachePools() API calls should check filesystem closed
[ https://issues.apache.org/jira/browse/HDFS-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532097#comment-14532097 ] Hadoop QA commented on HDFS-8332: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 52s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 11s | The applied patch generated 14 new checkstyle issues (total was 734, now 735). | | {color:green}+1{color} | whitespace | 0m 5s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 3s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 17s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 167m 33s | Tests failed in hadoop-hdfs. | | | | 210m 49s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.tracing.TestTraceAdmin | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731054/HDFS-8332-002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 4c7b9b6 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10840/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10840/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10840/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10840/console | This message was automatically generated. DistributedFileSystem listCacheDirectives() and listCachePools() API calls should check filesystem closed - Key: HDFS-8332 URL: https://issues.apache.org/jira/browse/HDFS-8332 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R Labels: BB2015-05-TBR Attachments: HDFS-8332-000.patch, HDFS-8332-001.patch, HDFS-8332-002.patch I could see {{listCacheDirectives()}} and {{listCachePools()}} APIs can be called even after the filesystem close. Instead these calls should do {{checkOpen}} and throws: {code} java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:464) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8220: --- Status: Patch Available (was: In Progress) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8220: --- Attachment: HDFS-8220-004.patch Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8325) Misspelling of threshold in log4j.properties for tests
[ https://issues.apache.org/jira/browse/HDFS-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-8325: Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed this to trunk and branch-2. Thanks [~brahmareddy] for contribution. Misspelling of threshold in log4j.properties for tests --- Key: HDFS-8325 URL: https://issues.apache.org/jira/browse/HDFS-8325 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Labels: BB2015-05-TBR Fix For: 2.8.0 Attachments: HDFS-8325.patch log4j.properties file for test contains misspelling log4j.threshhold. We should use log4j.threshold correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8325) Misspelling of threshold in log4j.properties for tests
[ https://issues.apache.org/jira/browse/HDFS-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-8325: Priority: Minor (was: Major) Affects Version/s: 2.7.0 Labels: (was: BB2015-05-TBR) Misspelling of threshold in log4j.properties for tests --- Key: HDFS-8325 URL: https://issues.apache.org/jira/browse/HDFS-8325 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8325.patch log4j.properties file for test contains misspelling log4j.threshhold. We should use log4j.threshold correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8203) Erasure Coding: Seek and other Ops in DFSStripedInputStream.
[ https://issues.apache.org/jira/browse/HDFS-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8203: - Attachment: HDFS-8203.003.patch Thanks Jing for the review! I update the patch to address all your comments and all related tests run successfully in my local env. In last patch, I changed {{readOneStripe}} not need to fetch all striped group cells and just start from the cell contains the target pos. But I didn't modify {{StripeRange#offsetInBlock}}, it still was {{n \* stripeLen}}, but if the *pos* is at middle of stripe cell group if there is {{seekToNewSource}} or {{seek}}, it's not correct if we seek back and address your #1 commet. In this patch, I do a bit more change in {{readOneStripe}}: change the {{StripeRange#offsetInBlock}} to be {{offsetInBlockGroup}}. Erasure Coding: Seek and other Ops in DFSStripedInputStream. Key: HDFS-8203 URL: https://issues.apache.org/jira/browse/HDFS-8203 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8203.001.patch, HDFS-8203.002.patch, HDFS-8203.003.patch In HDFS-7782 and HDFS-8033, we handle pread and stateful read for {{DFSStripedInputStream}}, we also need handle other operations, such as {{seek}}, zerocopy read ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8203) Erasure Coding: Seek and other Ops in DFSStripedInputStream.
[ https://issues.apache.org/jira/browse/HDFS-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8203: - Attachment: (was: HDFS-8203.003.patch) Erasure Coding: Seek and other Ops in DFSStripedInputStream. Key: HDFS-8203 URL: https://issues.apache.org/jira/browse/HDFS-8203 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8203.001.patch, HDFS-8203.002.patch In HDFS-7782 and HDFS-8033, we handle pread and stateful read for {{DFSStripedInputStream}}, we also need handle other operations, such as {{seek}}, zerocopy read ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8325) Misspelling of threshold in log4j.properties for tests
[ https://issues.apache.org/jira/browse/HDFS-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532095#comment-14532095 ] Hudson commented on HDFS-8325: -- FAILURE: Integrated in Hadoop-trunk-Commit #7755 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7755/]) HDFS-8325. Misspelling of threshold in log4j.properties for tests. Contributed by Brahma Reddy Battula. (aajisaka: rev 449e4426a5cc1382eef0cbaa9bd4eb2221c89da1) * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/log4j.properties * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Misspelling of threshold in log4j.properties for tests --- Key: HDFS-8325 URL: https://issues.apache.org/jira/browse/HDFS-8325 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8325.patch log4j.properties file for test contains misspelling log4j.threshhold. We should use log4j.threshold correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3384) DataStreamer thread should be closed immediatly when failed to setup a PipelineForAppendOrRecovery
[ https://issues.apache.org/jira/browse/HDFS-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532137#comment-14532137 ] Vinayakumar B commented on HDFS-3384: - Hi [~umamaheswararao], Thanks for taking up old issue. You might want to rebase last HDFS-3384_2.patch, one with test and fixed review comments. DataStreamer thread should be closed immediatly when failed to setup a PipelineForAppendOrRecovery -- Key: HDFS-3384 URL: https://issues.apache.org/jira/browse/HDFS-3384 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.0-alpha Reporter: Brahma Reddy Battula Assignee: amith Labels: BB2015-05-TBR Attachments: HDFS-3384-3.patch, HDFS-3384.patch, HDFS-3384_2.patch, HDFS-3384_2.patch, HDFS-3384_2.patch Scenraio: = write a file corrupt block manually call append.. {noformat} 2012-04-19 09:33:10,776 INFO hdfs.DFSClient (DFSOutputStream.java:createBlockOutputStream(1059)) - Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1039) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:939) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) 2012-04-19 09:33:10,807 WARN hdfs.DFSClient (DFSOutputStream.java:run(549)) - DataStreamer Exception java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:510) 2012-04-19 09:33:10,807 WARN hdfs.DFSClient (DFSOutputStream.java:hflush(1511)) - Error while syncing java.io.IOException: All datanodes 10.18.40.20:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:908) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) java.io.IOException: All datanodes 10.18.40.20:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:908) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532157#comment-14532157 ] Hadoop QA commented on HDFS-8220: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731100/HDFS-8220-004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 449e442 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10842/console | This message was automatically generated. Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding
[ https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532195#comment-14532195 ] Uma Maheswara Rao G commented on HDFS-8129: --- I thought that too and did not do basically because we already kept that proto definitions separate erasurecode.proto It is ok I can rename them too if no objections on it. Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding Key: HDFS-8129 URL: https://issues.apache.org/jira/browse/HDFS-8129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HDFS-8129-0.patch Currently I see some classes named as ErasureCode* and some are with EC* I feel we should maintain consistent naming across project. This jira to correct the places where we named differently to be a unique. And also to discuss which naming we can follow from now onwards when we create new classes. ErasureCoding* should be fine IMO. Lets discuss what others feel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8203) Erasure Coding: Seek and other Ops in DFSStripedInputStream.
[ https://issues.apache.org/jira/browse/HDFS-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8203: - Attachment: HDFS-8203.003.patch Erasure Coding: Seek and other Ops in DFSStripedInputStream. Key: HDFS-8203 URL: https://issues.apache.org/jira/browse/HDFS-8203 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8203.001.patch, HDFS-8203.002.patch, HDFS-8203.003.patch In HDFS-7782 and HDFS-8033, we handle pread and stateful read for {{DFSStripedInputStream}}, we also need handle other operations, such as {{seek}}, zerocopy read ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8220: --- Attachment: HDFS-8220-HDFS-7285.005.patch Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch, HDFS-8220-HDFS-7285.005.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-8329) Erasure coding: Rename Striped block recovery to reconstruction to eliminate confusion.
[ https://issues.apache.org/jira/browse/HDFS-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-8329 started by Yi Liu. Erasure coding: Rename Striped block recovery to reconstruction to eliminate confusion. --- Key: HDFS-8329 URL: https://issues.apache.org/jira/browse/HDFS-8329 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Both in NN and DN, we use striped block recovery and sometime use reconstruction. The striped block recovery make people confused with block recovery, we should use striped block reconstruction to eliminate confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding
[ https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-8129: -- Attachment: HDFS-8129-0.patch Attached the initial patch. Just renamed from EC* classes to ErasureCoding* Did not renamed the classes which are already under erasurecode package. Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding Key: HDFS-8129 URL: https://issues.apache.org/jira/browse/HDFS-8129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HDFS-8129-0.patch Currently I see some classes named as ErasureCode* and some are with EC* I feel we should maintain consistent naming across project. This jira to correct the places where we named differently to be a unique. And also to discuss which naming we can follow from now onwards when we create new classes. ErasureCoding* should be fine IMO. Lets discuss what others feel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding
[ https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532187#comment-14532187 ] Vinayakumar B commented on HDFS-8129: - It would be better to rename {{ECInfoProto}} and {{ECZoneInfoProto}} too though they are in {{ErasureCodingProtos}}, it would look good in PBHelper methods during convertion. Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding Key: HDFS-8129 URL: https://issues.apache.org/jira/browse/HDFS-8129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HDFS-8129-0.patch Currently I see some classes named as ErasureCode* and some are with EC* I feel we should maintain consistent naming across project. This jira to correct the places where we named differently to be a unique. And also to discuss which naming we can follow from now onwards when we create new classes. ErasureCoding* should be fine IMO. Lets discuss what others feel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8209) Support different number of datanode directories in MiniDFSCluster.
[ https://issues.apache.org/jira/browse/HDFS-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532221#comment-14532221 ] Hadoop QA commented on HDFS-8209: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 17s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 19s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 20s | The applied patch generated 19 new checkstyle issues (total was 734, now 735). | | {color:green}+1{color} | whitespace | 0m 3s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 7s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 1m 20s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 82m 37s | Tests failed in hadoop-hdfs. | | | | 104m 55s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestDataTransferKeepalive | | | hadoop.hdfs.server.datanode.TestDatanodeStartupOptions | | Timed out tests | org.apache.hadoop.hdfs.TestDatanodeRegistration | | | org.apache.hadoop.hdfs.server.datanode.TestBlockReplacement | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731087/HDFS-8209_1.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / 449e442 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10841/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10841/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10841/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10841/console | This message was automatically generated. Support different number of datanode directories in MiniDFSCluster. --- Key: HDFS-8209 URL: https://issues.apache.org/jira/browse/HDFS-8209 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-8209.patch, HDFS-8209_1.patch I want to create MiniDFSCluster with 2 datanode and for each datanode I want to set different number of StorageTypes, but in this case I am getting ArrayIndexOutOfBoundsException. My cluster schema is like this. {code} final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf) .numDataNodes(2) .storageTypes(new StorageType[][] {{ StorageType.DISK, StorageType.ARCHIVE },{ StorageType.DISK } }) .build(); {code} *Exception* : {code} java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hdfs.MiniDFSCluster.makeDataNodeDirs(MiniDFSCluster.java:1218) at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1402) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:832) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8338) SimulatedFSDataset support multiple storages/volumes
Walter Su created HDFS-8338: --- Summary: SimulatedFSDataset support multiple storages/volumes Key: HDFS-8338 URL: https://issues.apache.org/jira/browse/HDFS-8338 Project: Hadoop HDFS Issue Type: Improvement Reporter: Walter Su Assignee: Walter Su Priority: Minor Some tests(like Mover/Balancer) move block among storages. The tests use FsDatasetImpl because SimulatedFSDataset doesn't support multiple storages. The tests can be faster if they utilize SimulatedFSDataset, only if SimulatedFSDataset support multiple storages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-3384) DataStreamer thread should be closed immediatly when failed to setup a PipelineForAppendOrRecovery
[ https://issues.apache.org/jira/browse/HDFS-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532170#comment-14532170 ] Uma Maheswara Rao G commented on HDFS-3384: --- oh Yes. You are right. I have to rebase latest one. Thanks. I will do that in some tome today. DataStreamer thread should be closed immediatly when failed to setup a PipelineForAppendOrRecovery -- Key: HDFS-3384 URL: https://issues.apache.org/jira/browse/HDFS-3384 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.0.0-alpha Reporter: Brahma Reddy Battula Assignee: amith Labels: BB2015-05-TBR Attachments: HDFS-3384-3.patch, HDFS-3384.patch, HDFS-3384_2.patch, HDFS-3384_2.patch, HDFS-3384_2.patch Scenraio: = write a file corrupt block manually call append.. {noformat} 2012-04-19 09:33:10,776 INFO hdfs.DFSClient (DFSOutputStream.java:createBlockOutputStream(1059)) - Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1039) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:939) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) 2012-04-19 09:33:10,807 WARN hdfs.DFSClient (DFSOutputStream.java:run(549)) - DataStreamer Exception java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:510) 2012-04-19 09:33:10,807 WARN hdfs.DFSClient (DFSOutputStream.java:hflush(1511)) - Error while syncing java.io.IOException: All datanodes 10.18.40.20:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:908) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) java.io.IOException: All datanodes 10.18.40.20:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:908) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8289) Erasure Coding: add ECSchema to HdfsFileStatus
[ https://issues.apache.org/jira/browse/HDFS-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8289: Summary: Erasure Coding: add ECSchema to HdfsFileStatus (was: DFSStripedOutputStream uses an additional rpc all to getErasureCodingInfo) Erasure Coding: add ECSchema to HdfsFileStatus -- Key: HDFS-8289 URL: https://issues.apache.org/jira/browse/HDFS-8289 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Yong Zhang Attachments: HDFS-8289.000.patch, HDFS-8289.001.patch, HDFS-8289.002.patch, HDFS-8289.003.patch {code} // ECInfo is restored from NN just before writing striped files. ecInfo = dfsClient.getErasureCodingInfo(src); {code} The rpc call above can be avoided by adding ECSchema to HdfsFileStatus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8289) Erasure Coding: add ECSchema to HdfsFileStatus
[ https://issues.apache.org/jira/browse/HDFS-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-8289: Attachment: HDFS-8289.004.patch Thanks for updating the patch, Yong. The 003 patch looks good to me except it needs some rebase. +1. Since the change is trivial I just rebased the patch and will commit it shortly. Erasure Coding: add ECSchema to HdfsFileStatus -- Key: HDFS-8289 URL: https://issues.apache.org/jira/browse/HDFS-8289 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Yong Zhang Attachments: HDFS-8289.000.patch, HDFS-8289.001.patch, HDFS-8289.002.patch, HDFS-8289.003.patch, HDFS-8289.004.patch {code} // ECInfo is restored from NN just before writing striped files. ecInfo = dfsClient.getErasureCodingInfo(src); {code} The rpc call above can be avoided by adding ECSchema to HdfsFileStatus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533279#comment-14533279 ] Aaron T. Myers commented on HDFS-6440: -- Hey Jesse, Thanks a lot for working through my feedback, responses below. bq. I'm not sure how we would test this when needing to change the structure of the FS to support more than 2 NNs. Would you recommend (1) recognizing the old layout and then (2) transfering it into the new layout? The reason this seems silly (to me) is that the layout is only enforced by the way the minicluster is used/setup, rather than the way things would actually be run. By moving things into the appropriate directories per-nn, but keeping everything else below that the same, I think we keep the same upgrade properties but don't need to do the above contrived/synthetic upgrade. I'm specifically thinking about just expanding {{TestRollingUpgrade}} with some tests that exercise the 2 NN scenario, e.g. amending or expanding {{testRollingUpgradeWithQJM}}. bq. Maybe some salesforce terminology leak here.snip Cool, that's what I figured. The new comment looks good to me. bq. Yes, it for when there is an error and you want to run the exact sequence of failovers again in the test. Minor helper, but can be useful when trying to track down ordering dependency issues (which there shoudn't be, but sometimes these things can creep in). Sorry, maybe I wasn't clear. I get the point of using the random seed in the first place, but I'm specifically talking about the fact that in {{doWriteOverFailoverTest}} we change the value of that variable, log the value, and then never read it again. Doesn't seem like that's doing anything. bq. It can either be an InterruptedException or an IOException when transfering the checkpoint. Interrupted (ie) thrown if we are interrupted while waiting the any checkpoint to complete. IOE if there is an execution exception when doing the checkpoint.snip Right, I get that, but what I was pointing out was just that in the previous version of the patch the variable {{ie}} was never being assigned to anything but {{null}}. Here was the code in that patch, note the 4th-to-last line: {code} +InterruptedException ie = null; +IOException ioe= null; +int i = 0; +boolean success = false; +for (; i uploads.size(); i++) { + FutureTransferFsImage.TransferResult upload = uploads.get(i); + try { +// TODO should there be some smarts here about retries nodes that are not the active NN? +if (upload.get() == TransferFsImage.TransferResult.SUCCESS) { + success = true; + //avoid getting the rest of the results - we don't care since we had a successful upload + break; +} + + } catch (ExecutionException e) { +ioe = new IOException(Exception during image upload: + e.getMessage(), +e.getCause()); +break; + } catch (InterruptedException e) { +ie = null; +break; + } +} {code} That's fixed in the latest version of the patch, where the variable {{ie}} is assigned to {{e}} when an {{InterruptedException}} occurs, so I think we're good. bq. There is {{TestFailoverWithBlockTokensEnabled}}snip Ah, my bad. Yes indeed, that looks good to me. The overlapping range issue is exactly what I wanted to see tested. Support more than 2 NameNodes - Key: HDFS-6440 URL: https://issues.apache.org/jira/browse/HDFS-6440 Project: Hadoop HDFS Issue Type: New Feature Components: auto-failover, ha, namenode Affects Versions: 2.4.0 Reporter: Jesse Yates Assignee: Jesse Yates Fix For: 3.0.0 Attachments: Multiple-Standby-NameNodes_V1.pdf, hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-multiple-snn-trunk-v0.patch Most of the work is already done to support more than 2 NameNodes (one active, one standby). This would be the last bit to support running multiple _standby_ NameNodes; one of the standbys should be available for fail-over. Mostly, this is a matter of updating how we parse configurations, some complexity around managing the checkpointing, and updating a whole lot of tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533118#comment-14533118 ] Hadoop QA commented on HDFS-8220: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 44s | Pre-patch HDFS-7285 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 41s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 1m 57s | The applied patch generated 40 release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 42s | The applied patch generated 277 new checkstyle issues (total was 0, now 275). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 43s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 14s | The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 190m 5s | Tests failed in hadoop-hdfs. | | | | 233m 27s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | | Inconsistent synchronization of org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time Unsynchronized access at DFSOutputStream.java:89% of time Unsynchronized access at DFSOutputStream.java:[line 146] | | | Possible null pointer dereference of arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:[line 206] | | | Unread field:field be static? At ErasureCodingWorker.java:[line 251] | | | Should org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$StripedReader be a _static_ inner class? At ErasureCodingWorker.java:inner class? At ErasureCodingWorker.java:[lines 910-912] | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema):in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema): String.getBytes() At ErasureCodingZoneManager.java:[line 117] | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath):in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath): new String(byte[]) At ErasureCodingZoneManager.java:[line 81] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:[line 84] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.planReadPortions(int, int, long, int, int) At StripedBlockUtil.java:to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.planReadPortions(int, int, long, int, int) At StripedBlockUtil.java:[line 204] | | Failed unit tests | hadoop.hdfs.server.blockmanagement.TestDatanodeManager | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.TestRecoverStripedFile | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.blockmanagement.TestReplicationPolicy | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731181/HDFS-8220-HDFS-7285.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / c61c9c8 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/10848/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10848/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/10848/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10848/artifact/patchprocess/testrun_hadoop-hdfs.txt
[jira] [Commented] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533167#comment-14533167 ] Hudson commented on HDFS-7980: -- FAILURE: Integrated in Hadoop-trunk-Commit #7760 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7760/]) HDFS-7980. Incremental BlockReport will dramatically slow down namenode startup. Contributed by Walter Su (szetszwo: rev f9427f1760cce7e0befc3e066cebd0912652a411) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Incremental BlockReport will dramatically slow down the startup of a namenode -- Key: HDFS-7980 URL: https://issues.apache.org/jira/browse/HDFS-7980 Project: Hadoop HDFS Issue Type: Bug Reporter: Hui Zheng Assignee: Walter Su Labels: BB2015-05-TBR Fix For: 2.7.1 Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, HDFS-7980.003.patch, HDFS-7980.004.patch, HDFS-7980.004.repost.patch In the current implementation the datanode will call the reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before calling the bpNamenode.blockReport() method. So in a large(several thousands of datanodes) and busy cluster it will slow down(more than one hour) the startup of namenode. {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = now(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } final ArrayListDatanodeCommand cmds = new ArrayListDatanodeCommand(); // Flush any block information that precedes the block report. Otherwise // we have a chance that we will miss the delHint information // or we will report an RBW replica after the BlockReport already reports // a FINALIZED one. reportReceivedDeletedBlocks(); lastDeletedReport = startTime; . // Send the reports to the NN. int numReportsSent = 0; int numRPCs = 0; boolean success = false; long brSendStartTime = now(); try { if (totalBlockCount dnConf.blockReportSplitThreshold) { // Below split threshold, send all reports in a single message. DatanodeCommand cmd = bpNamenode.blockReport( bpRegistration, bpos.getBlockPoolId(), reports); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8345) Storage policy APIs must be exposed via the FileSystem interface
[ https://issues.apache.org/jira/browse/HDFS-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-8345: Summary: Storage policy APIs must be exposed via the FileSystem interface (was: Storage policy APIs must be exposed via FileSystem API) Storage policy APIs must be exposed via the FileSystem interface Key: HDFS-8345 URL: https://issues.apache.org/jira/browse/HDFS-8345 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.7.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal The storage policy APIs are not exposed via FileSystem. Since DistributedFileSystem is tagged as LimitedPrivate we should expose the APIs through FileSystem for use by other applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7559) Create unit test to automatically compare HDFS related classes and hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533116#comment-14533116 ] Ray Chiang commented on HDFS-7559: -- Link to MAPREDUCE equivalent. Create unit test to automatically compare HDFS related classes and hdfs-default.xml --- Key: HDFS-7559 URL: https://issues.apache.org/jira/browse/HDFS-7559 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: BB2015-05-TBR, supportability Attachments: HDFS-7559.001.patch, HDFS-7559.002.patch, HDFS-7559.003.patch, HDFS-7559.004.patch Create a unit test that will automatically compare the fields in the various HDFS related classes and hdfs-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7980: -- Resolution: Fixed Fix Version/s: 2.7.1 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks for explaining it. +1 patch looks good. I have committed this. Thanks, Walter! Incremental BlockReport will dramatically slow down the startup of a namenode -- Key: HDFS-7980 URL: https://issues.apache.org/jira/browse/HDFS-7980 Project: Hadoop HDFS Issue Type: Bug Reporter: Hui Zheng Assignee: Walter Su Labels: BB2015-05-TBR Fix For: 2.7.1 Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, HDFS-7980.003.patch, HDFS-7980.004.patch, HDFS-7980.004.repost.patch In the current implementation the datanode will call the reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before calling the bpNamenode.blockReport() method. So in a large(several thousands of datanodes) and busy cluster it will slow down(more than one hour) the startup of namenode. {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = now(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } final ArrayListDatanodeCommand cmds = new ArrayListDatanodeCommand(); // Flush any block information that precedes the block report. Otherwise // we have a chance that we will miss the delHint information // or we will report an RBW replica after the BlockReport already reports // a FINALIZED one. reportReceivedDeletedBlocks(); lastDeletedReport = startTime; . // Send the reports to the NN. int numReportsSent = 0; int numRPCs = 0; boolean success = false; long brSendStartTime = now(); try { if (totalBlockCount dnConf.blockReportSplitThreshold) { // Below split threshold, send all reports in a single message. DatanodeCommand cmd = bpNamenode.blockReport( bpRegistration, bpos.getBlockPoolId(), reports); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8203) Erasure Coding: Seek and other Ops in DFSStripedInputStream.
[ https://issues.apache.org/jira/browse/HDFS-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao resolved HDFS-8203. - Resolution: Fixed Fix Version/s: HDFS-7285 Hadoop Flags: Reviewed I've committed this to the feature branch. Thanks Yi for the contribution! Erasure Coding: Seek and other Ops in DFSStripedInputStream. Key: HDFS-8203 URL: https://issues.apache.org/jira/browse/HDFS-8203 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Fix For: HDFS-7285 Attachments: HDFS-8203.001.patch, HDFS-8203.002.patch, HDFS-8203.003.patch In HDFS-7782 and HDFS-8033, we handle pread and stateful read for {{DFSStripedInputStream}}, we also need handle other operations, such as {{seek}}, zerocopy read ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8289) Erasure Coding: add ECSchema to HdfsFileStatus
[ https://issues.apache.org/jira/browse/HDFS-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao resolved HDFS-8289. - Resolution: Fixed Fix Version/s: HDFS-7285 Hadoop Flags: Reviewed I've committed this to the feature branch. Thanks Yong for the contribution! Erasure Coding: add ECSchema to HdfsFileStatus -- Key: HDFS-8289 URL: https://issues.apache.org/jira/browse/HDFS-8289 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Yong Zhang Fix For: HDFS-7285 Attachments: HDFS-8289.000.patch, HDFS-8289.001.patch, HDFS-8289.002.patch, HDFS-8289.003.patch, HDFS-8289.004.patch {code} // ECInfo is restored from NN just before writing striped files. ecInfo = dfsClient.getErasureCodingInfo(src); {code} The rpc call above can be avoided by adding ECSchema to HdfsFileStatus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-8345) Storage policy APIs must be exposed via FileSystem API
[ https://issues.apache.org/jira/browse/HDFS-8345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal reassigned HDFS-8345: --- Assignee: Arpit Agarwal Storage policy APIs must be exposed via FileSystem API -- Key: HDFS-8345 URL: https://issues.apache.org/jira/browse/HDFS-8345 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.7.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal The storage policy APIs are not exposed via FileSystem. Since DistributedFileSystem is tagged as LimitedPrivate we should expose the APIs through FileSystem for use by other applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8345) Storage policy APIs must be exposed via FileSystem API
Arpit Agarwal created HDFS-8345: --- Summary: Storage policy APIs must be exposed via FileSystem API Key: HDFS-8345 URL: https://issues.apache.org/jira/browse/HDFS-8345 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.7.0 Reporter: Arpit Agarwal The storage policy APIs are not exposed via FileSystem. Since DistributedFileSystem is tagged as LimitedPrivate we should expose the APIs through FileSystem for use by other applications. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8203) Erasure Coding: Seek and other Ops in DFSStripedInputStream.
[ https://issues.apache.org/jira/browse/HDFS-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533104#comment-14533104 ] Jing Zhao commented on HDFS-8203: - Thanks for updating the patch, Yi! The 003 patch looks pretty good to me. +1. There are two unnecessary int conversion in the code. I will remove them while committing the patch. Erasure Coding: Seek and other Ops in DFSStripedInputStream. Key: HDFS-8203 URL: https://issues.apache.org/jira/browse/HDFS-8203 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8203.001.patch, HDFS-8203.002.patch, HDFS-8203.003.patch In HDFS-7782 and HDFS-8033, we handle pread and stateful read for {{DFSStripedInputStream}}, we also need handle other operations, such as {{seek}}, zerocopy read ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7559) Create unit test to automatically compare HDFS related classes and hdfs-default.xml
[ https://issues.apache.org/jira/browse/HDFS-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533115#comment-14533115 ] Ray Chiang commented on HDFS-7559: -- Link to YARN equivalent Create unit test to automatically compare HDFS related classes and hdfs-default.xml --- Key: HDFS-7559 URL: https://issues.apache.org/jira/browse/HDFS-7559 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ray Chiang Assignee: Ray Chiang Priority: Minor Labels: BB2015-05-TBR, supportability Attachments: HDFS-7559.001.patch, HDFS-7559.002.patch, HDFS-7559.003.patch, HDFS-7559.004.patch Create a unit test that will automatically compare the fields in the various HDFS related classes and hdfs-default.xml. It should throw an error if a property is missing in either the class or the file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8346) libwebhdfs build fails during link due to unresolved external symbols.
Chris Nauroth created HDFS-8346: --- Summary: libwebhdfs build fails during link due to unresolved external symbols. Key: HDFS-8346 URL: https://issues.apache.org/jira/browse/HDFS-8346 Project: Hadoop HDFS Issue Type: Bug Components: native Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth The libwebhdfs build is currently broken due to various unresolved external symbols during link. Multiple patches have introduced a few different forms of this breakage. See comments for full details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8346) libwebhdfs build fails during link due to unresolved external symbols.
[ https://issues.apache.org/jira/browse/HDFS-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533438#comment-14533438 ] Chris Nauroth commented on HDFS-8346: - * HDFS-573 implemented Windows compatibility for libhdfs. As part of this effort, we refactored OS-specific system calls into OS-specific source files. We also introduced htable.c for a platform-neutral hash table implementation. libwebhdfs includes jni_helper.c from libhdfs in its own build. jni_helper.c depends on the OS-specific sources and the hash table, but since the libwebhdfs build was not updated to include these files, this causes unresolved external symbols. * HADOOP-11403 introduced the {{terror}} function in exception.c as a cross-platform thread-safe error reporting function. libwebhdfs code was changed to call this function, but libwebhdfs doesn't use the exception.c from hadoop-common. Instead, it uses the exception.c from libhdfs, which does not have this function. libwebhdfs build fails during link due to unresolved external symbols. -- Key: HDFS-8346 URL: https://issues.apache.org/jira/browse/HDFS-8346 Project: Hadoop HDFS Issue Type: Bug Components: native Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth The libwebhdfs build is currently broken due to various unresolved external symbols during link. Multiple patches have introduced a few different forms of this breakage. See comments for full details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8291) Modify NN WebUI to display correct unit
[ https://issues.apache.org/jira/browse/HDFS-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533650#comment-14533650 ] John George commented on HDFS-8291: --- [~benoyantony] Good point - I agree that TB is more widely used. It makes sense that we change the calculation to display the units as TB. As an added credit, [~zhongyi-altiscale] if you can display 1TB=10 ^12^ bytes on the NN Web UI that would be great as well. Modify NN WebUI to display correct unit Key: HDFS-8291 URL: https://issues.apache.org/jira/browse/HDFS-8291 Project: Hadoop HDFS Issue Type: Improvement Reporter: Zhongyi Xie Assignee: Zhongyi Xie Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-8291.001.patch, HDFS-8291.002.patch NN Web UI displays its capacity and usage in TB, but it is actually TiB. We should either change the unit name or the calculation to ensure it follows standards. http://en.wikipedia.org/wiki/Tebibyte http://en.wikipedia.org/wiki/Terabyte -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4777) File creation with overwrite flag set to true results in logSync holding namesystem lock
[ https://issues.apache.org/jira/browse/HDFS-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Wagner updated HDFS-4777: -- Resolution: Duplicate Status: Resolved (was: Patch Available) This was resolved by HDFS-6886 and duplicates HDFS-6871. File creation with overwrite flag set to true results in logSync holding namesystem lock Key: HDFS-4777 URL: https://issues.apache.org/jira/browse/HDFS-4777 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.0, 2.0.0-alpha Reporter: Suresh Srinivas Assignee: Suresh Srinivas Labels: BB2015-05-TBR Attachments: HDFS-4777.patch FSNamesystem#startFileInternal calls delete. Delete method releases the write lock, making parts of startFileInternal code unintentionally executed without write lock being held. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533639#comment-14533639 ] Hudson commented on HDFS-7980: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #178 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/178/]) HDFS-7980. Incremental BlockReport will dramatically slow down namenode startup. Contributed by Walter Su (szetszwo: rev f9427f1760cce7e0befc3e066cebd0912652a411) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Incremental BlockReport will dramatically slow down the startup of a namenode -- Key: HDFS-7980 URL: https://issues.apache.org/jira/browse/HDFS-7980 Project: Hadoop HDFS Issue Type: Bug Reporter: Hui Zheng Assignee: Walter Su Labels: BB2015-05-TBR Fix For: 2.7.1 Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, HDFS-7980.003.patch, HDFS-7980.004.patch, HDFS-7980.004.repost.patch In the current implementation the datanode will call the reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before calling the bpNamenode.blockReport() method. So in a large(several thousands of datanodes) and busy cluster it will slow down(more than one hour) the startup of namenode. {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = now(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } final ArrayListDatanodeCommand cmds = new ArrayListDatanodeCommand(); // Flush any block information that precedes the block report. Otherwise // we have a chance that we will miss the delHint information // or we will report an RBW replica after the BlockReport already reports // a FINALIZED one. reportReceivedDeletedBlocks(); lastDeletedReport = startTime; . // Send the reports to the NN. int numReportsSent = 0; int numRPCs = 0; boolean success = false; long brSendStartTime = now(); try { if (totalBlockCount dnConf.blockReportSplitThreshold) { // Below split threshold, send all reports in a single message. DatanodeCommand cmd = bpNamenode.blockReport( bpRegistration, bpos.getBlockPoolId(), reports); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8346) libwebhdfs build fails during link due to unresolved external symbols.
[ https://issues.apache.org/jira/browse/HDFS-8346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533692#comment-14533692 ] Hadoop QA commented on HDFS-8346: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 5m 19s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | release audit | 0m 20s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | whitespace | 0m 4s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 31s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | native | 1m 11s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 165m 40s | Tests failed in hadoop-hdfs. | | | | 182m 21s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.tracing.TestTraceAdmin | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731283/HDFS-8346.001.patch | | Optional Tests | javac unit | | git revision | trunk / b88700d | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10851/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10851/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10851/console | This message was automatically generated. libwebhdfs build fails during link due to unresolved external symbols. -- Key: HDFS-8346 URL: https://issues.apache.org/jira/browse/HDFS-8346 Project: Hadoop HDFS Issue Type: Bug Components: native Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Labels: BB2015-05-RFC Attachments: HDFS-8346.001.patch The libwebhdfs build is currently broken due to various unresolved external symbols during link. Multiple patches have introduced a few different forms of this breakage. See comments for full details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8343) Erasure Coding: test failed in TestDFSStripedInputStream.testStatefulRead() when use ByteBuffer
[ https://issues.apache.org/jira/browse/HDFS-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su resolved HDFS-8343. - Resolution: Cannot Reproduce Erasure Coding: test failed in TestDFSStripedInputStream.testStatefulRead() when use ByteBuffer --- Key: HDFS-8343 URL: https://issues.apache.org/jira/browse/HDFS-8343 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Walter Su Assignee: Walter Su It's failed because of last commit {code} commit c61c9c855e7cd1d20f654c061ff16341ce2d9936 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6184) Capture NN's thread dump when it fails over
[ https://issues.apache.org/jira/browse/HDFS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6184: -- Attachment: HDFS-6184-3.patch Capture NN's thread dump when it fails over --- Key: HDFS-6184 URL: https://issues.apache.org/jira/browse/HDFS-6184 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Ming Ma Assignee: Ming Ma Labels: BB2015-05-TBR Attachments: HDFS-6184-2.patch, HDFS-6184-3.patch, HDFS-6184.patch We have seen several false positives in terms of when ZKFC considers NN to be unhealthy. Some of these triggers unnecessary failover. Examples, 1. SBN checkpoint caused ZKFC's RPC call into NN timeout. The consequence isn't bad; just that SBN will quit ZK membership and rejoin it later. But it is unnecessary. The reason is checkpoint acquires NN global write lock and all rpc requests are blocked. Even though HAServiceProtocol.monitorHealth doesn't need to acquire NN lock; it still needs to user service rpc queue. 2. When ANN is busy, sometimes the global lock can block other requests. ZKFC's RPC call timeout. This will trigger failover. The question is even if after the failover, the new ANN might run into similar issue. We can increase ZKFC to NN timeout value to mitigate this to some degree. If ZKFC can be more accurate in judgment if NN is health or not and can predict the failover will help, that will be useful. For example, we can, 1. Have ZKFC made decision based on NN thread dump. 2. Have a dedicated rpc pool for ZKFC NN. Given health check doesn't need to acquire NN global lock; so it can go through even if NN is doing checkpointing or very busy. Any comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7678) Erasure coding: DFSInputStream with decode functionality (pread)
[ https://issues.apache.org/jira/browse/HDFS-7678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7678: Attachment: HDFS-7678-HDFS-7285.11.patch I originally planned to consolidate striping terminologies under HDFS-8320 but it seems necessary to do some of the consolidation now. 011 patch updates the definition of a {{Stripe}}, which covers an arbitrary range from all internal blocks (same coverage for each internal block), a {{StripingCell}}, a {{StripingChunk}}. Using these new tools {{DFSStripedInputStream}} has a much simpler logic to read a {{Stripe}}, which in turn reads individual {{StripingChunk}}'s. Right now the patch is not completed but posting here for some feedback on the direction. With this structure, all complexities are migrated to abstract number-crunching in {{StripedBlockUtil}}, which can be easily and extensively unit-tested. Erasure coding: DFSInputStream with decode functionality (pread) Key: HDFS-7678 URL: https://issues.apache.org/jira/browse/HDFS-7678 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-7285 Reporter: Li Bo Assignee: Zhe Zhang Attachments: BlockGroupReader.patch, HDFS-7678-HDFS-7285.002.patch, HDFS-7678-HDFS-7285.003.patch, HDFS-7678-HDFS-7285.004.patch, HDFS-7678-HDFS-7285.005.patch, HDFS-7678-HDFS-7285.006.patch, HDFS-7678-HDFS-7285.007.patch, HDFS-7678-HDFS-7285.008.patch, HDFS-7678-HDFS-7285.009.patch, HDFS-7678-HDFS-7285.010.patch, HDFS-7678-HDFS-7285.11.patch, HDFS-7678.000.patch, HDFS-7678.001.patch A block group reader will read data from BlockGroup no matter in striping layout or contiguous layout. The corrupt blocks can be known before reading(told by namenode), or just be found during reading. The block group reader needs to do decoding work when some blocks are found corrupt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8203) Erasure Coding: Seek and other Ops in DFSStripedInputStream.
[ https://issues.apache.org/jira/browse/HDFS-8203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533688#comment-14533688 ] Yi Liu commented on HDFS-8203: -- Thanks a lot for the review and commit, Jing! :) Erasure Coding: Seek and other Ops in DFSStripedInputStream. Key: HDFS-8203 URL: https://issues.apache.org/jira/browse/HDFS-8203 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Fix For: HDFS-7285 Attachments: HDFS-8203.001.patch, HDFS-8203.002.patch, HDFS-8203.003.patch In HDFS-7782 and HDFS-8033, we handle pread and stateful read for {{DFSStripedInputStream}}, we also need handle other operations, such as {{seek}}, zerocopy read ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6184) Capture NN's thread dump when it fails over
[ https://issues.apache.org/jira/browse/HDFS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-6184: -- Labels: BB2015-05-RFC (was: BB2015-05-TBR) Capture NN's thread dump when it fails over --- Key: HDFS-6184 URL: https://issues.apache.org/jira/browse/HDFS-6184 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Ming Ma Assignee: Ming Ma Labels: BB2015-05-RFC Attachments: HDFS-6184-2.patch, HDFS-6184-3.patch, HDFS-6184.patch We have seen several false positives in terms of when ZKFC considers NN to be unhealthy. Some of these triggers unnecessary failover. Examples, 1. SBN checkpoint caused ZKFC's RPC call into NN timeout. The consequence isn't bad; just that SBN will quit ZK membership and rejoin it later. But it is unnecessary. The reason is checkpoint acquires NN global write lock and all rpc requests are blocked. Even though HAServiceProtocol.monitorHealth doesn't need to acquire NN lock; it still needs to user service rpc queue. 2. When ANN is busy, sometimes the global lock can block other requests. ZKFC's RPC call timeout. This will trigger failover. The question is even if after the failover, the new ANN might run into similar issue. We can increase ZKFC to NN timeout value to mitigate this to some degree. If ZKFC can be more accurate in judgment if NN is health or not and can predict the failover will help, that will be useful. For example, we can, 1. Have ZKFC made decision based on NN thread dump. 2. Have a dedicated rpc pool for ZKFC NN. Given health check doesn't need to acquire NN global lock; so it can go through even if NN is doing checkpointing or very busy. Any comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8150) Make getFileChecksum fail for blocks under construction
[ https://issues.apache.org/jira/browse/HDFS-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-8150: Status: Open (was: Patch Available) Make getFileChecksum fail for blocks under construction --- Key: HDFS-8150 URL: https://issues.apache.org/jira/browse/HDFS-8150 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: J.Andreina Priority: Critical Labels: BB2015-05-TBR Attachments: HDFS-8150.1.patch, HDFS-8150.2.patch We have seen the cases of validating data copy using checksum then the content of target changing. It turns out the target wasn't closed successfully, so it was still under-construction. One hour later, a lease recovery kicked in and truncated the block. Although this can be prevented in many ways, if there is no valid use case for getting file checksum from under-construction blocks, can it be disabled? E.g. Datanode can throw an exception if the replica is not in the finalized state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8150) Make getFileChecksum fail for blocks under construction
[ https://issues.apache.org/jira/browse/HDFS-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533744#comment-14533744 ] Akira AJISAKA commented on HDFS-8150: - Thanks [~andreina] for taking this issue. I agree with you that DFSClient should throw exception if the file is under construction. Some comments: {code} + if (blockLocations.isUnderConstruction()) { + throw new IOException(Fail to get block MD5, since file + src + is under construction ); + } {code} 1. Would you throw the exception when refreching block locations as well? 2. For block MD5, I'm thinking checksum is sufficient. We can get block MD5 checksum from finalized blocks even if the file is under construction. 3. nit: Would you remove unnecessarily whitespace after construction? 4. nit: The line is longer than 80 characters. {code} -import static org.junit.Assert.assertEquals; -import static org.junit.Assert.assertThat; +import static org.junit.Assert.*; {code} 5. nit: Would you please avoid using * for import? Make getFileChecksum fail for blocks under construction --- Key: HDFS-8150 URL: https://issues.apache.org/jira/browse/HDFS-8150 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: J.Andreina Priority: Critical Labels: BB2015-05-TBR Attachments: HDFS-8150.1.patch, HDFS-8150.2.patch We have seen the cases of validating data copy using checksum then the content of target changing. It turns out the target wasn't closed successfully, so it was still under-construction. One hour later, a lease recovery kicked in and truncated the block. Although this can be prevented in many ways, if there is no valid use case for getting file checksum from under-construction blocks, can it be disabled? E.g. Datanode can throw an exception if the replica is not in the finalized state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8294) Erasure Coding: Fix Findbug warnings present in erasure coding
[ https://issues.apache.org/jira/browse/HDFS-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533364#comment-14533364 ] Hadoop QA commented on HDFS-8294: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 7s | Pre-patch HDFS-7285 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 3m 55s | The applied patch generated 76 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 37s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 3s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 11s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 177m 1s | Tests failed in hadoop-hdfs. | | | | 222m 49s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.TestRecoverStripedFile | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.server.namenode.TestTransferFsImage | | | hadoop.hdfs.util.TestByteArrayManager | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731220/HDFS-8294-HDFS-7285.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / c61c9c8 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/10849/artifact/patchprocess/patchReleaseAuditProblems.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/10849/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/10849/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/10849/console | This message was automatically generated. Erasure Coding: Fix Findbug warnings present in erasure coding -- Key: HDFS-8294 URL: https://issues.apache.org/jira/browse/HDFS-8294 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Labels: BB2015-05-TBR Attachments: HDFS-8294-HDFS-7285.00.patch, HDFS-8294-HDFS-7285.01.patch, HDFS-8294-HDFS-7285.02.patch, HDFS-8294-HDFS-7285.03.patch Following are the findbug warnings :- # Possible null pointer dereference of arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) {code} Bug type NP_NULL_ON_SOME_PATH (click for details) In class org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction In method org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Value loaded from arr$ Dereferenced at BlockInfoStripedUnderConstruction.java:[line 206] Known null at BlockInfoStripedUnderConstruction.java:[line 200] {code} # Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema): String.getBytes() Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath): new String(byte[]) {code} Bug type DM_DEFAULT_ENCODING (click for details) In class org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager In method org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema) Called method String.getBytes() At ErasureCodingZoneManager.java:[line 116] Bug type DM_DEFAULT_ENCODING (click for details) In class org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager In method org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath)
[jira] [Commented] (HDFS-7980) Incremental BlockReport will dramatically slow down the startup of a namenode
[ https://issues.apache.org/jira/browse/HDFS-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533399#comment-14533399 ] Hudson commented on HDFS-7980: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #188 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/188/]) HDFS-7980. Incremental BlockReport will dramatically slow down namenode startup. Contributed by Walter Su (szetszwo: rev f9427f1760cce7e0befc3e066cebd0912652a411) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java Incremental BlockReport will dramatically slow down the startup of a namenode -- Key: HDFS-7980 URL: https://issues.apache.org/jira/browse/HDFS-7980 Project: Hadoop HDFS Issue Type: Bug Reporter: Hui Zheng Assignee: Walter Su Labels: BB2015-05-TBR Fix For: 2.7.1 Attachments: HDFS-7980.001.patch, HDFS-7980.002.patch, HDFS-7980.003.patch, HDFS-7980.004.patch, HDFS-7980.004.repost.patch In the current implementation the datanode will call the reportReceivedDeletedBlocks() method that is a IncrementalBlockReport before calling the bpNamenode.blockReport() method. So in a large(several thousands of datanodes) and busy cluster it will slow down(more than one hour) the startup of namenode. {code} ListDatanodeCommand blockReport() throws IOException { // send block report if timer has expired. final long startTime = now(); if (startTime - lastBlockReport = dnConf.blockReportInterval) { return null; } final ArrayListDatanodeCommand cmds = new ArrayListDatanodeCommand(); // Flush any block information that precedes the block report. Otherwise // we have a chance that we will miss the delHint information // or we will report an RBW replica after the BlockReport already reports // a FINALIZED one. reportReceivedDeletedBlocks(); lastDeletedReport = startTime; . // Send the reports to the NN. int numReportsSent = 0; int numRPCs = 0; boolean success = false; long brSendStartTime = now(); try { if (totalBlockCount dnConf.blockReportSplitThreshold) { // Below split threshold, send all reports in a single message. DatanodeCommand cmd = bpNamenode.blockReport( bpRegistration, bpos.getBlockPoolId(), reports); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-6757) Simplify lease manager with INodeID
[ https://issues.apache.org/jira/browse/HDFS-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533475#comment-14533475 ] Haohui Mai edited comment on HDFS-6757 at 5/7/15 10:20 PM: --- Thanks Jing for the reviews. The v14 patch addresses all comments except 7 and 11. {quote} 7. One question is, after we record INode id in lease manager, do we still want to record full paths of UC files in FSImage? {quote} It's a good proposal. I'll address it in a separate jira. {quote} 11. Maybe we can add extra tests for the delete and rename operations when UC files are involved. Specifically, a test on the internal lease recovery for renamed UC files may be useful. {quote} It looks like that {{TestLease}} has covered this case. was (Author: wheat9): Thanks Jing for the reviews. The v14 patch addresses all comments except 7 and 11. {quote} 7. One question is, after we record INode id in lease manager, do we still want to record full paths of UC files in FSImage? {quote} It's a good proposal. This has been discussed in this jira (https://issues.apache.org/jira/browse/HDFS-6757?focusedCommentId=14094487page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14094487). I prefer to addressing it in a separate jira since we don't have the consensus yet. {quote} 11. Maybe we can add extra tests for the delete and rename operations when UC files are involved. Specifically, a test on the internal lease recovery for renamed UC files may be useful. {quote} It looks like that {{TestLease}} has covered this case. Simplify lease manager with INodeID --- Key: HDFS-6757 URL: https://issues.apache.org/jira/browse/HDFS-6757 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Labels: BB2015-05-TBR Attachments: HDFS-6757.000.patch, HDFS-6757.001.patch, HDFS-6757.002.patch, HDFS-6757.003.patch, HDFS-6757.004.patch, HDFS-6757.005.patch, HDFS-6757.006.patch, HDFS-6757.007.patch, HDFS-6757.008.patch, HDFS-6757.009.patch, HDFS-6757.010.patch, HDFS-6757.011.patch, HDFS-6757.012.patch, HDFS-6757.013.patch, HDFS-6757.014.patch Currently the lease manager records leases based on path instead of inode ids. Therefore, the lease manager needs to carefully keep track of the path of active leases during renames and deletes. This can be a non-trivial task. This jira proposes to simplify the logic by tracking leases using inodeids instead of paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6757) Simplify lease manager with INodeID
[ https://issues.apache.org/jira/browse/HDFS-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6757: - Attachment: HDFS-6757.015.patch Simplify lease manager with INodeID --- Key: HDFS-6757 URL: https://issues.apache.org/jira/browse/HDFS-6757 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Labels: BB2015-05-TBR Attachments: HDFS-6757.000.patch, HDFS-6757.001.patch, HDFS-6757.002.patch, HDFS-6757.003.patch, HDFS-6757.004.patch, HDFS-6757.005.patch, HDFS-6757.006.patch, HDFS-6757.007.patch, HDFS-6757.008.patch, HDFS-6757.009.patch, HDFS-6757.010.patch, HDFS-6757.011.patch, HDFS-6757.012.patch, HDFS-6757.013.patch, HDFS-6757.014.patch, HDFS-6757.015.patch Currently the lease manager records leases based on path instead of inode ids. Therefore, the lease manager needs to carefully keep track of the path of active leases during renames and deletes. This can be a non-trivial task. This jira proposes to simplify the logic by tracking leases using inodeids instead of paths. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8337) httpfs doesn't work with creates from a jar with kerberos
Yongjun Zhang created HDFS-8337: --- Summary: httpfs doesn't work with creates from a jar with kerberos Key: HDFS-8337 URL: https://issues.apache.org/jira/browse/HDFS-8337 Project: Hadoop HDFS Issue Type: Bug Components: HDFS, hdfs-client Reporter: Yongjun Zhang Assignee: Yongjun Zhang In a secure cluster, running a simple program: {code} import org.apache.hadoop.conf.*; import org.apache.hadoop.fs.*; import org.apache.hadoop.security.*; class Foo { public static void main(String args[]) throws Exception { FileSystem fs = FileSystem.get(new java.net.URI(webhdfs://host:14000/), new Configuration()); System.out.println(fs.listStatus(new Path(/))[0]); java.io.OutputStream os = fs.create(new Path(/tmp/foo)); os.write('a'); os.close(); } } {code} Basically to access httpfs via webhdfs, the following exception is thrown: {code} [systest@yj52s ~]$ /usr/java/jdk1.7.0_67-cloudera/bin/java -cp $(hadoop classpath):. Foo 15/05/06 23:51:38 WARN ssl.FileBasedKeyStoresFactory: The property 'ssl.client.truststore.location' has not been set, no TrustStore will be loaded Exception in thread main org.apache.hadoop.ipc.RemoteException(com.sun.jersey.api.ParamException$QueryParamException): java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.fs.http.client.HttpFSFileSystem.Operation.GETDELEGATIONTOKEN at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:163) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:354) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:608) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:1299) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:237) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getAuthParameters(WebHdfsFileSystem.java:423) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toUrl(WebHdfsFileSystem.java:444) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractFsPathRunner.getUrl(WebHdfsFileSystem.java:691) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:603) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.listStatus(WebHdfsFileSystem.java:1277) at Foo.main(Foo.java:7) {code} Thanks [~qwertymaniac] and [~caseyjbrotherton] for reporting the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532158#comment-14532158 ] Rakesh R commented on HDFS-8220: [~libo-intel] I've attached the patch on latest code base. If you agree with the previous comment, we can use this patch to do the changes. Could you please take a look at this when you get a chance. Thanks! Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize --- Key: HDFS-8220 URL: https://issues.apache.org/jira/browse/HDFS-8220 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Attachments: HDFS-8220-001.patch, HDFS-8220-002.patch, HDFS-8220-003.patch, HDFS-8220-004.patch During write operations {{StripedDataStreamer#locateFollowingBlock}} fails to validate the available datanodes against the {{BlockGroupSize}}. Please see the exception to understand more: {code} 2015-04-22 14:56:11,313 WARN hdfs.DFSClient (DataStreamer.java:run(538)) - DataStreamer Exception java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) 2015-04-22 14:56:11,313 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1718)) - Shutting down the Mini HDFS Cluster 2015-04-22 14:56:11,313 ERROR hdfs.DFSClient (DFSClient.java:closeAllFilesBeingWritten(608)) - Failed to close inode 16387 java.io.IOException: DataStreamer Exception: at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:544) at org.apache.hadoop.hdfs.StripedDataStreamer.run(StripedDataStreamer.java:1) Caused by: java.lang.NullPointerException at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:374) at org.apache.hadoop.hdfs.StripedDataStreamer.locateFollowingBlock(StripedDataStreamer.java:157) at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1332) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:424) ... 1 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8289) DFSStripedOutputStream uses an additional rpc all to getErasureCodingInfo
[ https://issues.apache.org/jira/browse/HDFS-8289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yong Zhang updated HDFS-8289: - Attachment: HDFS-8289.003.patch DFSStripedOutputStream uses an additional rpc all to getErasureCodingInfo - Key: HDFS-8289 URL: https://issues.apache.org/jira/browse/HDFS-8289 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Tsz Wo Nicholas Sze Assignee: Yong Zhang Attachments: HDFS-8289.000.patch, HDFS-8289.001.patch, HDFS-8289.002.patch, HDFS-8289.003.patch {code} // ECInfo is restored from NN just before writing striped files. ecInfo = dfsClient.getErasureCodingInfo(src); {code} The rpc call above can be avoided by adding ECSchema to HdfsFileStatus. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8310) Fix TestCLI.testAll 'help: help for find' on Windows
[ https://issues.apache.org/jira/browse/HDFS-8310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532422#comment-14532422 ] Hudson commented on HDFS-8310: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #187 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/187/]) HDFS-8310. Fix TestCLI.testAll 'help: help for find' on Windows. (Kiran Kumar M R via Xiaoyu Yao) (xyao: rev 7a26d174aff9535f7a60711bee586e225891b383) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/cli/util/RegexpAcrossOutputComparator.java Fix TestCLI.testAll 'help: help for find' on Windows Key: HDFS-8310 URL: https://issues.apache.org/jira/browse/HDFS-8310 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: 2.7.0 Reporter: Xiaoyu Yao Assignee: Kiran Kumar M R Priority: Minor Labels: BB2015-05-RFC Fix For: 2.8.0 Attachments: HDFS-8310-001.patch, HDFS-8310-002.patch The test uses RegexAcrossOutputComparator in a single regex, which does not match on Windows as shown below. {code} 2015-04-30 01:14:01,737 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(155)) - --- 2015-04-30 01:14:01,737 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(156)) - Test ID: [31] 2015-04-30 01:14:01,737 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(157)) -Test Description: [help: help for find] 2015-04-30 01:14:01,737 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(158)) - 2015-04-30 01:14:01,738 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(162)) - Test Commands: [-help find] 2015-04-30 01:14:01,738 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(166)) - 2015-04-30 01:14:01,738 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(173)) - 2015-04-30 01:14:01,738 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(177)) - Comparator: [RegexpAcrossOutputComparator] 2015-04-30 01:14:01,738 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(179)) - Comparision result: [fail] 2015-04-30 01:14:01,739 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(181)) - Expected output: [-find path \.\.\. expression \.\.\. : Finds all files that match the specified expression and applies selected actions to them\. If no path is specified then defaults to the current working directory\. If no expression is specified then defaults to -print\. The following primary expressions are recognised: -name pattern -iname pattern Evaluates as true if the basename of the file matches the pattern using standard file system globbing\. If -iname is used then the match is case insensitive\. -print -print0 Always evaluates to true. Causes the current pathname to be written to standard output followed by a newline. If the -print0 expression is used then an ASCII NULL character is appended rather than a newline. The following operators are recognised: expression -a expression expression -and expression expression expression Logical AND operator for joining two expressions\. Returns true if both child expressions return true\. Implied by the juxtaposition of two expressions and so does not need to be explicitly specified\. The second expression will not be applied if the first fails\. ] 2015-04-30 01:14:01,739 INFO cli.CLITestHelper (CLITestHelper.java:displayResults(183)) - Actual output: [-find path ... expression ... : Finds all files that match the specified expression and applies selected actions to them. If no path is specified then defaults to the current working directory. If no expression is specified then defaults to -print. The following primary expressions are recognised: -name pattern -iname pattern Evaluates as true if the basename of the file matches the pattern using standard file system globbing. If -iname is used then the match is case insensitive. -print -print0 Always evaluates to true. Causes the current pathname to be written to standard output followed by a newline. If the -print0 expression is used then an ASCII NULL character is appended rather than a newline. The following operators are recognised: expression -a expression expression -and expression expression expression Logical AND operator for joining two expressions. Returns true
[jira] [Updated] (HDFS-8274) NFS configuration nfs.dump.dir not working
[ https://issues.apache.org/jira/browse/HDFS-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated HDFS-8274: -- Attachment: HDFS-8274.patch NFS configuration nfs.dump.dir not working -- Key: HDFS-8274 URL: https://issues.apache.org/jira/browse/HDFS-8274 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Ajith S Assignee: Ajith S Attachments: HDFS-8274.patch As per the document http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html we can configure {quote} nfs.dump.dir {quote} as nfs file dump directory, but using this configuration in *hdfs-site.xml* doesn't work and when nfs gateway is started, default location is used i.e \tmp\.hdfs-nfs The reason being the key expected in *NfsConfigKeys.java* {code} public static final String DFS_NFS_FILE_DUMP_DIR_KEY = nfs.file.dump.dir; {code} we can change it to *nfs.dump.dir* instead -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8333) Create EC zone should not need superuser privilege
[ https://issues.apache.org/jira/browse/HDFS-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yong Zhang updated HDFS-8333: - Status: Patch Available (was: Open) Create EC zone should not need superuser privilege -- Key: HDFS-8333 URL: https://issues.apache.org/jira/browse/HDFS-8333 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yong Zhang Assignee: Yong Zhang Attachments: HDFS-8333-HDFS-7285.000.patch create EC zone should not need superuser privilege, for example, in multiple tenant scenario, common users only manage their own directory and subdirectory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8333) Create EC zone should not need superuser privilege
[ https://issues.apache.org/jira/browse/HDFS-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yong Zhang updated HDFS-8333: - Attachment: HDFS-8333-HDFS-7285.000.patch Create EC zone should not need superuser privilege -- Key: HDFS-8333 URL: https://issues.apache.org/jira/browse/HDFS-8333 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yong Zhang Assignee: Yong Zhang Attachments: HDFS-8333-HDFS-7285.000.patch create EC zone should not need superuser privilege, for example, in multiple tenant scenario, common users only manage their own directory and subdirectory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8329) Erasure coding: Rename Striped block recovery to reconstruction to eliminate confusion.
[ https://issues.apache.org/jira/browse/HDFS-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu resolved HDFS-8329. -- Resolution: Duplicate Erasure coding: Rename Striped block recovery to reconstruction to eliminate confusion. --- Key: HDFS-8329 URL: https://issues.apache.org/jira/browse/HDFS-8329 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Both in NN and DN, we use striped block recovery and sometime use reconstruction. The striped block recovery make people confused with block recovery, we should use striped block reconstruction to eliminate confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8329) Erasure coding: Rename Striped block recovery to reconstruction to eliminate confusion.
[ https://issues.apache.org/jira/browse/HDFS-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532537#comment-14532537 ] Yi Liu commented on HDFS-8329: -- Just found this was duplicated with HDFS-7955. Erasure coding: Rename Striped block recovery to reconstruction to eliminate confusion. --- Key: HDFS-8329 URL: https://issues.apache.org/jira/browse/HDFS-8329 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yi Liu Assignee: Yi Liu Both in NN and DN, we use striped block recovery and sometime use reconstruction. The striped block recovery make people confused with block recovery, we should use striped block reconstruction to eliminate confusion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8129) Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding
[ https://issues.apache.org/jira/browse/HDFS-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532404#comment-14532404 ] Vinayakumar B commented on HDFS-8129: - Patch looks good to me. +1 Erasure Coding: Maintain consistent naming for Erasure Coding related classes - EC/ErasureCoding Key: HDFS-8129 URL: https://issues.apache.org/jira/browse/HDFS-8129 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Priority: Minor Attachments: HDFS-8129-0.patch, HDFS-8129-1.patch Currently I see some classes named as ErasureCode* and some are with EC* I feel we should maintain consistent naming across project. This jira to correct the places where we named differently to be a unique. And also to discuss which naming we can follow from now onwards when we create new classes. ErasureCoding* should be fine IMO. Lets discuss what others feel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2484) checkLease should throw FileNotFoundException when file does not exist
[ https://issues.apache.org/jira/browse/HDFS-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532444#comment-14532444 ] Brahma Reddy Battula commented on HDFS-2484: [~shv] Please have look at following link.. https://wiki.apache.org/hadoop/2015MayBugBash checkLease should throw FileNotFoundException when file does not exist -- Key: HDFS-2484 URL: https://issues.apache.org/jira/browse/HDFS-2484 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.22.0, 2.0.0-alpha Reporter: Konstantin Shvachko Assignee: Rakesh R Labels: BB2015-05-TBR Fix For: 2.8.0 Attachments: HDFS-2484.00.patch, HDFS-2484.01.patch, HDFS-2484.02.patch When file is deleted during its creation {{FSNamesystem.checkLease(String src, String holder)}} throws {{LeaseExpiredException}}. It would be more informative if it thrown {{FileNotFoundException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-8019) Erasure Coding: erasure coding chunk buffer allocation and management
[ https://issues.apache.org/jira/browse/HDFS-8019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-8019 started by Vinayakumar B. --- Erasure Coding: erasure coding chunk buffer allocation and management - Key: HDFS-8019 URL: https://issues.apache.org/jira/browse/HDFS-8019 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Vinayakumar B Attachments: HDFS-8019-HDFS-7285-01.patch As a task of HDFS-7344, this is to come up a chunk buffer pool allocating and managing coding chunk buffers, either based on on-heap or off-heap. Note this assumes some DataNodes are powerful in computing and performing EC coding work, so better to have this dedicated buffer pool and management. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7833) DataNode reconfiguration does not recalculate valid volumes required, based on configured failed volumes tolerated.
[ https://issues.apache.org/jira/browse/HDFS-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532426#comment-14532426 ] Hudson commented on HDFS-7833: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #187 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/187/]) HDFS-7833. DataNode reconfiguration does not recalculate valid volumes required, based on configured failed volumes tolerated. Contributed by Lei (Eddy) Xu. (cnauroth: rev 6633a8474d7e92fa028ede8fd6c6e41b6c5887f5) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt DataNode reconfiguration does not recalculate valid volumes required, based on configured failed volumes tolerated. --- Key: HDFS-7833 URL: https://issues.apache.org/jira/browse/HDFS-7833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Lei (Eddy) Xu Labels: BB2015-05-TBR Fix For: 2.8.0 Attachments: HDFS-7833.000.patch, HDFS-7833.001.patch, HDFS-7833.002.patch, HDFS-7833.003.patch DataNode reconfiguration never recalculates {{FsDatasetImpl#validVolsRequired}}. This may cause incorrect behavior of the {{dfs.datanode.failed.volumes.tolerated}} property if reconfiguration causes the DataNode to run with a different total number of volumes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8325) Misspelling of threshold in log4j.properties for tests
[ https://issues.apache.org/jira/browse/HDFS-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532423#comment-14532423 ] Hudson commented on HDFS-8325: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #187 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/187/]) HDFS-8325. Misspelling of threshold in log4j.properties for tests. Contributed by Brahma Reddy Battula. (aajisaka: rev 449e4426a5cc1382eef0cbaa9bd4eb2221c89da1) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/log4j.properties Misspelling of threshold in log4j.properties for tests --- Key: HDFS-8325 URL: https://issues.apache.org/jira/browse/HDFS-8325 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.7.0 Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8325.patch log4j.properties file for test contains misspelling log4j.threshhold. We should use log4j.threshold correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2484) checkLease should throw FileNotFoundException when file does not exist
[ https://issues.apache.org/jira/browse/HDFS-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532428#comment-14532428 ] Hudson commented on HDFS-2484: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #187 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/187/]) HDFS-2484. checkLease should throw FileNotFoundException when file does not exist. Contributed by Rakesh R. (shv: rev c75cfa29cfc527242837d80962688aa53c111e72) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCreation.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestLease.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt checkLease should throw FileNotFoundException when file does not exist -- Key: HDFS-2484 URL: https://issues.apache.org/jira/browse/HDFS-2484 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.22.0, 2.0.0-alpha Reporter: Konstantin Shvachko Assignee: Rakesh R Labels: BB2015-05-TBR Fix For: 2.8.0 Attachments: HDFS-2484.00.patch, HDFS-2484.01.patch, HDFS-2484.02.patch When file is deleted during its creation {{FSNamesystem.checkLease(String src, String holder)}} throws {{LeaseExpiredException}}. It would be more informative if it thrown {{FileNotFoundException}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7900) hbase keeps too many deleted file descriptor
[ https://issues.apache.org/jira/browse/HDFS-7900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-7900: --- Assignee: (was: Brahma Reddy Battula) hbase keeps too many deleted file descriptor -- Key: HDFS-7900 URL: https://issues.apache.org/jira/browse/HDFS-7900 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Environment: hadoop cdh5 2.3.0 hbase 0.98 Reporter: zhangshilong Priority: Critical I delete some hbase's files manually or use rm -rf blk_ to delete the blockfile directly, but hbase keeps the file descriptor very long time. I found these file descriptor may be kept in shortcircuitcache replicaMap, but could not find when the file descriptor will be removed. replicaMap has no limits size for putting. run: lsof -p pid |grep deleted part of result: lk_1102309377_28571078.meta (deleted) java8430 hbase 8537r REG 8,145 536870912 806553760 /search/hadoop08/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir61/blk_1102541663 (deleted) java8430 hbase 8540r REG 8,113 4194311 812434001 /search/hadoop06/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir62/subdir21/blk_1102524193_28785917.meta (deleted) java8430 hbase 8541r REG 8,65 536870912 813718517 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir31/subdir14/blk_1102523618 (deleted) java8430 hbase 8542r REG 8,65 4194311 813718518 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir31/subdir14/blk_1102523618_28785342.meta (deleted) java8430 hbase 8543r REG 8,193 536870912 1886733815 /search/hadoop12/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir20/subdir22/blk_1102533549 (deleted) java8430 hbase 8544r REG 8,65 4194311 814828988 /search/hadoop03/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir49/blk_1102676585_28938309.meta (deleted) java8430 hbase 8545r REG 8,17 4194311 812962137 /search/hadoop10/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir53/blk_1102597493_28859217.meta (deleted) java8430 hbase 8546r REG 8,97 4194311 810468992 /search/hadoop05/data/current/BP-715213703-10.141.46.46-1418959337587/current/finalized/subdir4/subdir46/blk_1102524567_28786291.meta (deleted) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8209) Support different number of datanode directories in MiniDFSCluster.
[ https://issues.apache.org/jira/browse/HDFS-8209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532507#comment-14532507 ] surendra singh lilhore commented on HDFS-8209: -- Failed test case and checkstyle is not related to this patch. Support different number of datanode directories in MiniDFSCluster. --- Key: HDFS-8209 URL: https://issues.apache.org/jira/browse/HDFS-8209 Project: Hadoop HDFS Issue Type: Improvement Components: test Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-8209.patch, HDFS-8209_1.patch I want to create MiniDFSCluster with 2 datanode and for each datanode I want to set different number of StorageTypes, but in this case I am getting ArrayIndexOutOfBoundsException. My cluster schema is like this. {code} final MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf) .numDataNodes(2) .storageTypes(new StorageType[][] {{ StorageType.DISK, StorageType.ARCHIVE },{ StorageType.DISK } }) .build(); {code} *Exception* : {code} java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hdfs.MiniDFSCluster.makeDataNodeDirs(MiniDFSCluster.java:1218) at org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:1402) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:832) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8220) Erasure Coding: StripedDataStreamer fails to handle the blocklocations which doesn't satisfy BlockGroupSize
[ https://issues.apache.org/jira/browse/HDFS-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532529#comment-14532529 ] Hadoop QA commented on HDFS-8220: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 43s | Pre-patch HDFS-7285 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 30s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 38s | There were no new javadoc warning messages. | | {color:red}-1{color} | release audit | 3m 56s | The applied patch generated 83 release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 17s | The applied patch generated 1178 new checkstyle issues (total was 3, now 591). | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 3m 14s | The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | native | 3m 13s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 192m 23s | Tests failed in hadoop-hdfs. | | | | 239m 7s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | | Inconsistent synchronization of org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 89% of time Unsynchronized access at DFSOutputStream.java:89% of time Unsynchronized access at DFSOutputStream.java:[line 146] | | | Possible null pointer dereference of arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Dereferenced at BlockInfoStripedUnderConstruction.java:[line 206] | | | Unread field:field be static? At ErasureCodingWorker.java:[line 251] | | | Should org.apache.hadoop.hdfs.server.datanode.erasurecode.ErasureCodingWorker$StripedReader be a _static_ inner class? At ErasureCodingWorker.java:inner class? At ErasureCodingWorker.java:[lines 910-912] | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema):in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema): String.getBytes() At ErasureCodingZoneManager.java:[line 117] | | | Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath):in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath): new String(byte[]) At ErasureCodingZoneManager.java:[line 81] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:[line 84] | | | Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.planReadPortions(int, int, long, int, int) At StripedBlockUtil.java:to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.planReadPortions(int, int, long, int, int) At StripedBlockUtil.java:[line 204] | | Failed unit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.server.namenode.TestAuditLogs | | | hadoop.hdfs.server.namenode.TestFileTruncate | | | hadoop.hdfs.server.blockmanagement.TestReplicationPolicy | | | hadoop.hdfs.TestRecoverStripedFile | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12731118/HDFS-8220-HDFS-7285.005.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7285 / 2a89e1d | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/10843/artifact/patchprocess/patchReleaseAuditProblems.txt | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/10843/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/10843/artifact/patchprocess/whitespace.txt | | Findbugs warnings |
[jira] [Updated] (HDFS-8333) Create EC zone should not need superuser privilege
[ https://issues.apache.org/jira/browse/HDFS-8333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yong Zhang updated HDFS-8333: - Attachment: (was: HDFS-8333.000.patch) Create EC zone should not need superuser privilege -- Key: HDFS-8333 URL: https://issues.apache.org/jira/browse/HDFS-8333 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Yong Zhang Assignee: Yong Zhang create EC zone should not need superuser privilege, for example, in multiple tenant scenario, common users only manage their own directory and subdirectory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8274) NFS configuration nfs.dump.dir not working
[ https://issues.apache.org/jira/browse/HDFS-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated HDFS-8274: -- Labels: BB2015-05-TBR (was: ) NFS configuration nfs.dump.dir not working -- Key: HDFS-8274 URL: https://issues.apache.org/jira/browse/HDFS-8274 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Ajith S Assignee: Ajith S Labels: BB2015-05-TBR Attachments: HDFS-8274.patch As per the document http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html we can configure {quote} nfs.dump.dir {quote} as nfs file dump directory, but using this configuration in *hdfs-site.xml* doesn't work and when nfs gateway is started, default location is used i.e \tmp\.hdfs-nfs The reason being the key expected in *NfsConfigKeys.java* {code} public static final String DFS_NFS_FILE_DUMP_DIR_KEY = nfs.file.dump.dir; {code} we can change it to *nfs.dump.dir* instead -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8340) NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY)
Ajith S created HDFS-8340: - Summary: NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY) Key: HDFS-8340 URL: https://issues.apache.org/jira/browse/HDFS-8340 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Ajith S Assignee: Ajith S Priority: Minor According to documentation http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html bq. For larger data transfer size, one needs to update “nfs.rtmax” and “nfs.rtmax” in hdfs-site.xml. nfs.rtmax is mentioned twice, instead it should be “nfs.rtmax” and “nfs.wtmax” -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8340) NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY)
[ https://issues.apache.org/jira/browse/HDFS-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated HDFS-8340: -- Component/s: nfs NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY) - Key: HDFS-8340 URL: https://issues.apache.org/jira/browse/HDFS-8340 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Ajith S Assignee: Ajith S Priority: Minor According to documentation http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html bq. For larger data transfer size, one needs to update “nfs.rtmax” and “nfs.rtmax” in hdfs-site.xml. nfs.rtmax is mentioned twice, instead it should be “nfs.rtmax” and “nfs.wtmax” -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8340) NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY)
[ https://issues.apache.org/jira/browse/HDFS-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated HDFS-8340: -- Labels: (was: nfs) NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY) - Key: HDFS-8340 URL: https://issues.apache.org/jira/browse/HDFS-8340 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Ajith S Assignee: Ajith S Priority: Minor According to documentation http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html bq. For larger data transfer size, one needs to update “nfs.rtmax” and “nfs.rtmax” in hdfs-site.xml. nfs.rtmax is mentioned twice, instead it should be “nfs.rtmax” and “nfs.wtmax” -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8340) NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY)
[ https://issues.apache.org/jira/browse/HDFS-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated HDFS-8340: -- Labels: BB2015-05-TBR (was: ) NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY) - Key: HDFS-8340 URL: https://issues.apache.org/jira/browse/HDFS-8340 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Ajith S Assignee: Ajith S Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-8340.patch According to documentation http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html bq. For larger data transfer size, one needs to update “nfs.rtmax” and “nfs.rtmax” in hdfs-site.xml. nfs.rtmax is mentioned twice, instead it should be “nfs.rtmax” and “nfs.wtmax” -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8019) Erasure Coding: erasure coding chunk buffer allocation and management
[ https://issues.apache.org/jira/browse/HDFS-8019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-8019: Attachment: HDFS-8019-HDFS-7285-01.patch Draft implementation of ByteBufferManager. Almost same logic as ByteArrayManager, except, this manages ByteBuffers and have option to specify whether to manage native or heap buffers. As of now separate managers need to be created for native and heap buffers. This patch is just for initial review and feedback. Erasure Coding: erasure coding chunk buffer allocation and management - Key: HDFS-8019 URL: https://issues.apache.org/jira/browse/HDFS-8019 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kai Zheng Assignee: Vinayakumar B Attachments: HDFS-8019-HDFS-7285-01.patch As a task of HDFS-7344, this is to come up a chunk buffer pool allocating and managing coding chunk buffers, either based on on-heap or off-heap. Note this assumes some DataNodes are powerful in computing and performing EC coding work, so better to have this dedicated buffer pool and management. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8274) NFS configuration nfs.dump.dir not working
[ https://issues.apache.org/jira/browse/HDFS-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated HDFS-8274: -- Status: Patch Available (was: Open) Please review the patch. As per the analysis, corrected as per expected property NFS configuration nfs.dump.dir not working -- Key: HDFS-8274 URL: https://issues.apache.org/jira/browse/HDFS-8274 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Ajith S Assignee: Ajith S Attachments: HDFS-8274.patch As per the document http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html we can configure {quote} nfs.dump.dir {quote} as nfs file dump directory, but using this configuration in *hdfs-site.xml* doesn't work and when nfs gateway is started, default location is used i.e \tmp\.hdfs-nfs The reason being the key expected in *NfsConfigKeys.java* {code} public static final String DFS_NFS_FILE_DUMP_DIR_KEY = nfs.file.dump.dir; {code} we can change it to *nfs.dump.dir* instead -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8340) NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY)
[ https://issues.apache.org/jira/browse/HDFS-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated HDFS-8340: -- Status: Patch Available (was: Open) Fixed as per analysis. Please review and commit NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY) - Key: HDFS-8340 URL: https://issues.apache.org/jira/browse/HDFS-8340 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Ajith S Assignee: Ajith S Priority: Minor Attachments: HDFS-8340.patch According to documentation http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html bq. For larger data transfer size, one needs to update “nfs.rtmax” and “nfs.rtmax” in hdfs-site.xml. nfs.rtmax is mentioned twice, instead it should be “nfs.rtmax” and “nfs.wtmax” -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8340) NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY)
[ https://issues.apache.org/jira/browse/HDFS-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated HDFS-8340: -- Attachment: HDFS-8340.patch NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY) - Key: HDFS-8340 URL: https://issues.apache.org/jira/browse/HDFS-8340 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Ajith S Assignee: Ajith S Priority: Minor Attachments: HDFS-8340.patch According to documentation http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html bq. For larger data transfer size, one needs to update “nfs.rtmax” and “nfs.rtmax” in hdfs-site.xml. nfs.rtmax is mentioned twice, instead it should be “nfs.rtmax” and “nfs.wtmax” -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8340) NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY)
[ https://issues.apache.org/jira/browse/HDFS-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated HDFS-8340: -- Component/s: documentation NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY) - Key: HDFS-8340 URL: https://issues.apache.org/jira/browse/HDFS-8340 Project: Hadoop HDFS Issue Type: Bug Components: documentation, nfs Affects Versions: 2.7.0 Reporter: Ajith S Assignee: Ajith S Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-8340.patch According to documentation http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html bq. For larger data transfer size, one needs to update “nfs.rtmax” and “nfs.rtmax” in hdfs-site.xml. nfs.rtmax is mentioned twice, instead it should be “nfs.rtmax” and “nfs.wtmax” -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8347) Using chunkSize to perform erasure decoding in stripping blocks recovering
[ https://issues.apache.org/jira/browse/HDFS-8347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-8347: Description: While investigating a test failure in {{TestRecoverStripedFile}}, found one issue. An extra configurable buffer size instead of the chunkSize defined the schema is used to perform the decoding, which is incorrect and will cause a decoding failure as below. This is exposed by latest change in erasure coder. {noformat} 2015-05-08 18:50:06,607 WARN datanode.DataNode (ErasureCodingWorker.java:run(386)) - Transfer failed for all targets. 2015-05-08 18:50:06,608 WARN datanode.DataNode (ErasureCodingWorker.java:run(399)) - Failed to recover striped block: BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775792_1001 2015-05-08 18:50:06,609 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(826)) - Exception for BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775784_1001 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:787) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:803) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:745) {noformat} was: While investigating a test failure in {{TestRecoverStripedFile}}, found two issues: * An extra buffer size instead of the chunkSize defined the schema is used to perform the decoding, which is incorrect and will cause a decoding failure as below. This is exposed by latest change in erasure coder. {noformat} 2015-05-08 18:50:06,607 WARN datanode.DataNode (ErasureCodingWorker.java:run(386)) - Transfer failed for all targets. 2015-05-08 18:50:06,608 WARN datanode.DataNode (ErasureCodingWorker.java:run(399)) - Failed to recover striped block: BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775792_1001 2015-05-08 18:50:06,609 INFO datanode.DataNode (BlockReceiver.java:receiveBlock(826)) - Exception for BP-1597876081-10.239.12.51-1431082199073:blk_-9223372036854775784_1001 java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134) at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:472) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:787) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:803) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:250) at java.lang.Thread.run(Thread.java:745) {noformat} * In raw erasrue coder, a bad optimization in below codes. It assumes the heap buffer backed by the bytes array available for reading or writing always starts with zero and takes the whole. {code} protected static byte[][] toArrays(ByteBuffer[] buffers) { byte[][] bytesArr = new byte[buffers.length][]; ByteBuffer buffer; for (int i = 0; i buffers.length; i++) { buffer = buffers[i]; if (buffer == null) { bytesArr[i] = null; continue; } if (buffer.hasArray()) { bytesArr[i] = buffer.array(); } else { throw new IllegalArgumentException(Invalid ByteBuffer passed, + expecting heap buffer); } } return bytesArr; } {code} Will attach a patch soon to fix the two issues. Using chunkSize to perform erasure decoding in stripping blocks recovering -- Key: HDFS-8347 URL:
[jira] [Commented] (HDFS-8332) DFS client API calls should check filesystem closed
[ https://issues.apache.org/jira/browse/HDFS-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533877#comment-14533877 ] Rakesh R commented on HDFS-8332: Jenkins complains about few checkstyle issues but those are unrelated to my patch. Kindly review. Thanks! {code} ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java:711:41: 'blocks' hides a field. ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java:717: Line is longer than 80 characters (found 85). ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java:711:41: 'blocks' hides a field. ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java:717: Line is longer than 80 characters (found 85). ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:1: File length is 3,218 lines (max allowed is 2,000). ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java:711:41: 'blocks' hides a field. ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java:717: Line is longer than 80 characters (found 85). ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:1: File length is 3,218 lines (max allowed is 2,000). ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:1: File length is 3,241 lines (max allowed is 2,000). ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java:711:41: 'blocks' hides a field. ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeFile.java:717: Line is longer than 80 characters (found 85). ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:1: File length is 3,218 lines (max allowed is 2,000). ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:1: File length is 3,241 lines (max allowed is 2,000). ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:1: File length is 3,241 lines (max allowed is 2,000). {code} DFS client API calls should check filesystem closed --- Key: HDFS-8332 URL: https://issues.apache.org/jira/browse/HDFS-8332 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R Labels: BB2015-05-RFC Attachments: HDFS-8332-000.patch, HDFS-8332-001.patch, HDFS-8332-002.patch I could see {{listCacheDirectives()}} and {{listCachePools()}} APIs can be called even after the filesystem close. Instead these calls should do {{checkOpen}} and throws: {code} java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:464) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8332) DFS client API calls should check filesystem closed
[ https://issues.apache.org/jira/browse/HDFS-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8332: --- Labels: BB2015-05-RFC (was: BB2015-05-TBR) DFS client API calls should check filesystem closed --- Key: HDFS-8332 URL: https://issues.apache.org/jira/browse/HDFS-8332 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R Labels: BB2015-05-RFC Attachments: HDFS-8332-000.patch, HDFS-8332-001.patch, HDFS-8332-002.patch I could see {{listCacheDirectives()}} and {{listCachePools()}} APIs can be called even after the filesystem close. Instead these calls should do {{checkOpen}} and throws: {code} java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:464) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8340) NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY)
[ https://issues.apache.org/jira/browse/HDFS-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533898#comment-14533898 ] Ajith S commented on HDFS-8340: --- The whitespac ewrning is not related to patch NFS transfer size configuration should include nfs.wtmax(DFS_NFS_MAX_WRITE_TRANSFER_SIZE_KEY) - Key: HDFS-8340 URL: https://issues.apache.org/jira/browse/HDFS-8340 Project: Hadoop HDFS Issue Type: Bug Components: documentation, nfs Affects Versions: 2.7.0 Reporter: Ajith S Assignee: Ajith S Priority: Minor Labels: BB2015-05-RFC Attachments: HDFS-8340.patch According to documentation http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html bq. For larger data transfer size, one needs to update “nfs.rtmax” and “nfs.rtmax” in hdfs-site.xml. nfs.rtmax is mentioned twice, instead it should be “nfs.rtmax” and “nfs.wtmax” -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6291) FSImage may be left unclosed in BootstrapStandby#doRun()
[ https://issues.apache.org/jira/browse/HDFS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-6291: --- Labels: BB2015-05-RFC (was: BB2015-05-TBR) FSImage may be left unclosed in BootstrapStandby#doRun() Key: HDFS-6291 URL: https://issues.apache.org/jira/browse/HDFS-6291 Project: Hadoop HDFS Issue Type: Bug Components: ha Reporter: Ted Yu Priority: Minor Labels: BB2015-05-RFC Attachments: HDFS-6291.2.patch, HDFS-6291.patch At around line 203: {code} if (!checkLogsAvailableForRead(image, imageTxId, curTxId)) { return ERR_CODE_LOGS_UNAVAILABLE; } {code} If we return following the above check, image is not closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8274) NFS configuration nfs.dump.dir not working
[ https://issues.apache.org/jira/browse/HDFS-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated HDFS-8274: -- Labels: BB2015-05-RFC (was: BB2015-05-TBR) NFS configuration nfs.dump.dir not working -- Key: HDFS-8274 URL: https://issues.apache.org/jira/browse/HDFS-8274 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.7.0 Reporter: Ajith S Assignee: Ajith S Labels: BB2015-05-RFC Attachments: HDFS-8274.patch As per the document http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html we can configure {quote} nfs.dump.dir {quote} as nfs file dump directory, but using this configuration in *hdfs-site.xml* doesn't work and when nfs gateway is started, default location is used i.e \tmp\.hdfs-nfs The reason being the key expected in *NfsConfigKeys.java* {code} public static final String DFS_NFS_FILE_DUMP_DIR_KEY = nfs.file.dump.dir; {code} we can change it to *nfs.dump.dir* instead -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6291) FSImage may be left unclosed in BootstrapStandby#doRun()
[ https://issues.apache.org/jira/browse/HDFS-6291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533900#comment-14533900 ] Rakesh R commented on HDFS-6291: Thanks [~yunsh] for the patch. lgtm +1 (non-binding) FSImage may be left unclosed in BootstrapStandby#doRun() Key: HDFS-6291 URL: https://issues.apache.org/jira/browse/HDFS-6291 Project: Hadoop HDFS Issue Type: Bug Components: ha Reporter: Ted Yu Priority: Minor Labels: BB2015-05-RFC Attachments: HDFS-6291.2.patch, HDFS-6291.patch At around line 203: {code} if (!checkLogsAvailableForRead(image, imageTxId, curTxId)) { return ERR_CODE_LOGS_UNAVAILABLE; } {code} If we return following the above check, image is not closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4185) Add a metric for number of active leases
[ https://issues.apache.org/jira/browse/HDFS-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533897#comment-14533897 ] Vinayakumar B commented on HDFS-4185: - Thanks [~rakeshr] for taking up this issue. Source changes looks good. I think current test will just check the API, not the metric via metric system. It would be better to have a test to verify the metric via metric system. Check {{TestNameNodeMetrics}} FYR. Add a metric for number of active leases Key: HDFS-4185 URL: https://issues.apache.org/jira/browse/HDFS-4185 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 0.23.4, 2.0.2-alpha Reporter: Kihwal Lee Assignee: Rakesh R Labels: BB2015-05-RFC Attachments: HDFS-4185-001.patch, HDFS-4185-002.patch, HDFS-4185-003.patch We have seen cases of systematic open file leaks, which could have been detected if we have a metric that shows number of active leases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8067) haadmin prints out stale help messages
[ https://issues.apache.org/jira/browse/HDFS-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533914#comment-14533914 ] Hudson commented on HDFS-8067: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7769 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7769/]) HDFS-8067. haadmin prints out stale help messages (Contributed by Ajith S) (vinayakumarb: rev 66988476d09a6d04c0b81a663db1e6e5a28c37fb) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt haadmin prints out stale help messages -- Key: HDFS-8067 URL: https://issues.apache.org/jira/browse/HDFS-8067 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Ajith S Assignee: Ajith S Priority: Minor Fix For: 2.8.0 Attachments: HDFS-8067-01.patch, HDFS-8067-02.patch Scenario : Setting up multiple nameservices with HA configuration for each nameservice (manual failover) After starting the journal nodes and namenodes, both the nodes are in standby mode. all the following haadmin commands *haadmin* -transitionToActive -transitionToStandby -failover -getServiceState -checkHealth failed with exception _Illegal argument: Unable to determine the nameservice id._ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8174) Update replication count to live rep count in fsck report
[ https://issues.apache.org/jira/browse/HDFS-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533912#comment-14533912 ] Hudson commented on HDFS-8174: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7769 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7769/]) HDFS-8174. Update replication count to live rep count in fsck report. Contributed by J.Andreina (umamahesh: rev 2ea0f2fc938febd7fbbe03656a91ae3db1409c50) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Update replication count to live rep count in fsck report - Key: HDFS-8174 URL: https://issues.apache.org/jira/browse/HDFS-8174 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Minor Labels: BB2015-05-RFC Fix For: 2.8.0 Attachments: HDFS-8174.1.patch When one of the replica is decommissioned , fetching fsck report gives repl count is one less than the total replica information displayed. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} Update the description from rep to Live_rep -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5745) Unnecessary disk check triggered when socket operation has problem.
[ https://issues.apache.org/jira/browse/HDFS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jun aoki updated HDFS-5745: --- Labels: (was: BB2015-05-TBR) Unnecessary disk check triggered when socket operation has problem. --- Key: HDFS-5745 URL: https://issues.apache.org/jira/browse/HDFS-5745 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 1.2.1 Reporter: MaoYuan Xian Assignee: jun aoki Attachments: HDFS-5745.patch When BlockReceiver transfer data fails, it can be found SocketOutputStream translates the exception as IOException with the message The stream is closed: 2014-01-06 11:48:04,716 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run(): java.io.IOException: The stream is closed at org.apache.hadoop.net.SocketOutputStream.write at java.io.BufferedOutputStream.flushBuffer at java.io.BufferedOutputStream.flush at java.io.DataOutputStream.flush at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run at java.lang.Thread.run Which makes the checkDiskError method of DataNode called and triggers the disk scan. Can we make the modifications like below in checkDiskError to avoiding this unneccessary disk scan operations?: {code} --- a/src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java +++ b/src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java @@ -938,7 +938,8 @@ public class DataNode extends Configured || e.getMessage().startsWith(An established connection was aborted) || e.getMessage().startsWith(Broken pipe) || e.getMessage().startsWith(Connection reset) - || e.getMessage().contains(java.nio.channels.SocketChannel)) { + || e.getMessage().contains(java.nio.channels.SocketChannel) + || e.getMessage().startsWith(The stream is closed)) { LOG.info(Not checking disk as checkDiskError was called on a network + related exception); return; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-5745) Unnecessary disk check triggered when socket operation has problem.
[ https://issues.apache.org/jira/browse/HDFS-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jun aoki reassigned HDFS-5745: -- Assignee: jun aoki Unnecessary disk check triggered when socket operation has problem. --- Key: HDFS-5745 URL: https://issues.apache.org/jira/browse/HDFS-5745 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 1.2.1 Reporter: MaoYuan Xian Assignee: jun aoki Attachments: HDFS-5745.patch When BlockReceiver transfer data fails, it can be found SocketOutputStream translates the exception as IOException with the message The stream is closed: 2014-01-06 11:48:04,716 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: IOException in BlockReceiver.run(): java.io.IOException: The stream is closed at org.apache.hadoop.net.SocketOutputStream.write at java.io.BufferedOutputStream.flushBuffer at java.io.BufferedOutputStream.flush at java.io.DataOutputStream.flush at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run at java.lang.Thread.run Which makes the checkDiskError method of DataNode called and triggers the disk scan. Can we make the modifications like below in checkDiskError to avoiding this unneccessary disk scan operations?: {code} --- a/src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java +++ b/src/hdfs/org/apache/hadoop/hdfs/server/datanode/DataNode.java @@ -938,7 +938,8 @@ public class DataNode extends Configured || e.getMessage().startsWith(An established connection was aborted) || e.getMessage().startsWith(Broken pipe) || e.getMessage().startsWith(Connection reset) - || e.getMessage().contains(java.nio.channels.SocketChannel)) { + || e.getMessage().contains(java.nio.channels.SocketChannel) + || e.getMessage().startsWith(The stream is closed)) { LOG.info(Not checking disk as checkDiskError was called on a network + related exception); return; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6184) Capture NN's thread dump when it fails over
[ https://issues.apache.org/jira/browse/HDFS-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533803#comment-14533803 ] Akira AJISAKA commented on HDFS-6184: - Thanks [~mingma] for creating the patch. Some comments: 1. Would you remove {{@VisibleForTesting}} from ZKFailoverController#getLastHealthState? 2. Would you document the time unit of the parameter and when to get thread dump from the local NN to hdfs-default.xml? Minor formatting issues: {code} +} + + +try { {code} 1. Would you remove unnecessarily blank line? {code} +// Capture local NN thread dump when the target NN health state changes. +if (getLastHealthState() == HealthMonitor.State.SERVICE_NOT_RESPONDING || +getLastHealthState() == HealthMonitor.State.SERVICE_UNHEALTHY) +getLocalNNThreadDump(); {code} 2. Would you use braces {} for if statement? Capture NN's thread dump when it fails over --- Key: HDFS-6184 URL: https://issues.apache.org/jira/browse/HDFS-6184 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-6184-2.patch, HDFS-6184-3.patch, HDFS-6184.patch We have seen several false positives in terms of when ZKFC considers NN to be unhealthy. Some of these triggers unnecessary failover. Examples, 1. SBN checkpoint caused ZKFC's RPC call into NN timeout. The consequence isn't bad; just that SBN will quit ZK membership and rejoin it later. But it is unnecessary. The reason is checkpoint acquires NN global write lock and all rpc requests are blocked. Even though HAServiceProtocol.monitorHealth doesn't need to acquire NN lock; it still needs to user service rpc queue. 2. When ANN is busy, sometimes the global lock can block other requests. ZKFC's RPC call timeout. This will trigger failover. The question is even if after the failover, the new ANN might run into similar issue. We can increase ZKFC to NN timeout value to mitigate this to some degree. If ZKFC can be more accurate in judgment if NN is health or not and can predict the failover will help, that will be useful. For example, we can, 1. Have ZKFC made decision based on NN thread dump. 2. Have a dedicated rpc pool for ZKFC NN. Given health check doesn't need to acquire NN global lock; so it can go through even if NN is doing checkpointing or very busy. Any comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8257) Namenode rollingUpgrade option is incorrect
[ https://issues.apache.org/jira/browse/HDFS-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533863#comment-14533863 ] Vinayakumar B commented on HDFS-8257: - +1. Patch LGTM. Committing soon. Namenode rollingUpgrade option is incorrect --- Key: HDFS-8257 URL: https://issues.apache.org/jira/browse/HDFS-8257 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: J.Andreina Assignee: J.Andreina Labels: BB2015-05-RFC Attachments: HDFS-8257.1.patch ./hdfs namenode -rollingUpgrade supports rollback|started operations , but it is incorrect in document {noformat} hdfs namenode [-backup] | [-checkpoint] | [-format [-clusterid cid ] [-force] [-nonInteractive] ] | [-upgrade [-clusterid cid] [-renameReservedk-v pairs] ] | [-upgradeOnly [-clusterid cid] [-renameReservedk-v pairs] ] | [-rollback] | [-rollingUpgrade downgrade |rollback ] | {noformat} {noformat} -rollingUpgrade downgrade|rollback|started See Rolling Upgrade document for the detail. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6348) Secondary namenode - RMI Thread prevents JVM from exiting after main() completes
[ https://issues.apache.org/jira/browse/HDFS-6348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533864#comment-14533864 ] Rakesh R commented on HDFS-6348: Since it involves {{System.exit()}}, I couldn't manage to add a testcase. Appreciate reviews. Thanks! Secondary namenode - RMI Thread prevents JVM from exiting after main() completes - Key: HDFS-6348 URL: https://issues.apache.org/jira/browse/HDFS-6348 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Rakesh R Assignee: Rakesh R Labels: BB2015-05-TBR Attachments: HDFS-6348-003.patch, HDFS-6348.patch, HDFS-6348.patch, secondaryNN_threaddump_after_exit.log Secondary Namenode is not exiting when there is RuntimeException occurred during startup. Say I configured wrong configuration, due to that validation failed and thrown RuntimeException as shown below. But when I check the environment SecondaryNamenode process is alive. When analysed, RMI Thread is still alive, since it is not a daemon thread JVM is nit exiting. I'm attaching threaddump to this JIRA for more details about the thread. {code} java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1900) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.getInstance(BlockPlacementPolicy.java:199) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.init(BlockManager.java:256) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:635) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:260) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.init(SecondaryNameNode.java:205) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:695) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1868) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1892) ... 6 more Caused by: java.lang.ClassNotFoundException: Class com.huawei.hadoop.hdfs.server.blockmanagement.MyBlockPlacementPolicy not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1774) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1866) ... 7 more 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state 2014-05-07 14:27:04,666 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for standby state 2014-05-07 14:31:04,926 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG: {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8294) Erasure Coding: Fix Findbug warnings present in erasure coding
[ https://issues.apache.org/jira/browse/HDFS-8294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-8294: --- Labels: BB2015-05-RFC (was: BB2015-05-TBR) Erasure Coding: Fix Findbug warnings present in erasure coding -- Key: HDFS-8294 URL: https://issues.apache.org/jira/browse/HDFS-8294 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Rakesh R Assignee: Rakesh R Labels: BB2015-05-RFC Attachments: HDFS-8294-HDFS-7285.00.patch, HDFS-8294-HDFS-7285.01.patch, HDFS-8294-HDFS-7285.02.patch, HDFS-8294-HDFS-7285.03.patch Following are the findbug warnings :- # Possible null pointer dereference of arr$ in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) {code} Bug type NP_NULL_ON_SOME_PATH (click for details) In class org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction In method org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStripedUnderConstruction.initializeBlockRecovery(long) Value loaded from arr$ Dereferenced at BlockInfoStripedUnderConstruction.java:[line 206] Known null at BlockInfoStripedUnderConstruction.java:[line 200] {code} # Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema): String.getBytes() Found reliance on default encoding in org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath): new String(byte[]) {code} Bug type DM_DEFAULT_ENCODING (click for details) In class org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager In method org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.createErasureCodingZone(String, ECSchema) Called method String.getBytes() At ErasureCodingZoneManager.java:[line 116] Bug type DM_DEFAULT_ENCODING (click for details) In class org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager In method org.apache.hadoop.hdfs.server.namenode.ErasureCodingZoneManager.getECZoneInfo(INodesInPath) Called method new String(byte[]) At ErasureCodingZoneManager.java:[line 81] {code} # Inconsistent synchronization of org.apache.hadoop.hdfs.DFSOutputStream.streamer; locked 90% of time {code} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.hdfs.DFSOutputStream Field org.apache.hadoop.hdfs.DFSOutputStream.streamer Synchronized 90% of the time Unsynchronized access at DFSOutputStream.java:[line 142] Unsynchronized access at DFSOutputStream.java:[line 853] Unsynchronized access at DFSOutputStream.java:[line 617] Unsynchronized access at DFSOutputStream.java:[line 620] Unsynchronized access at DFSOutputStream.java:[line 630] Unsynchronized access at DFSOutputStream.java:[line 338] Unsynchronized access at DFSOutputStream.java:[line 734] Unsynchronized access at DFSOutputStream.java:[line 897] {code} # Dead store to offSuccess in org.apache.hadoop.hdfs.StripedDataStreamer.endBlock() {code} Bug type DLS_DEAD_LOCAL_STORE (click for details) In class org.apache.hadoop.hdfs.StripedDataStreamer In method org.apache.hadoop.hdfs.StripedDataStreamer.endBlock() Local variable named offSuccess At StripedDataStreamer.java:[line 105] {code} # Result of integer multiplication cast to long in org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.spaceConsumed() {code} Bug type ICAST_INTEGER_MULTIPLY_CAST_TO_LONG (click for details) In class org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped In method org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.spaceConsumed() At BlockInfoStriped.java:[line 208] {code} # Result of integer multiplication cast to long in org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) {code} Bug type ICAST_INTEGER_MULTIPLY_CAST_TO_LONG (click for details) In class org.apache.hadoop.hdfs.util.StripedBlockUtil In method org.apache.hadoop.hdfs.util.StripedBlockUtil.constructInternalBlock(LocatedStripedBlock, int, int, int, int) At StripedBlockUtil.java:[line 85] {code} # Switch statement found in org.apache.hadoop.hdfs.DFSStripedInputStream.fetchBlockByteRange(long, long, long, byte[], int, Map) where default case is missing {code} Bug type SF_SWITCH_NO_DEFAULT (click for details) In class org.apache.hadoop.hdfs.DFSStripedInputStream In method org.apache.hadoop.hdfs.DFSStripedInputStream.fetchBlockByteRange(long, long, long, byte[], int, Map) At DFSStripedInputStream.java:[lines 468-491] {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8067) haadmin prints out stale help messages
[ https://issues.apache.org/jira/browse/HDFS-8067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533878#comment-14533878 ] Vinayakumar B commented on HDFS-8067: - +1 for the latest patch. Will commit soon. haadmin prints out stale help messages -- Key: HDFS-8067 URL: https://issues.apache.org/jira/browse/HDFS-8067 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Ajith S Assignee: Ajith S Priority: Minor Labels: BB2015-05-RFC Attachments: HDFS-8067-01.patch, HDFS-8067-02.patch Scenario : Setting up multiple nameservices with HA configuration for each nameservice (manual failover) After starting the journal nodes and namenodes, both the nodes are in standby mode. all the following haadmin commands *haadmin* -transitionToActive -transitionToStandby -failover -getServiceState -checkHealth failed with exception _Illegal argument: Unable to determine the nameservice id._ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8174) Update replication count to live rep count in fsck report
[ https://issues.apache.org/jira/browse/HDFS-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533885#comment-14533885 ] Uma Maheswara Rao G commented on HDFS-8174: --- +1, I will commit it shortly. Thanks Ming Ma for the review! Update replication count to live rep count in fsck report - Key: HDFS-8174 URL: https://issues.apache.org/jira/browse/HDFS-8174 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Minor Labels: BB2015-05-RFC Attachments: HDFS-8174.1.patch When one of the replica is decommissioned , fetching fsck report gives repl count is one less than the total replica information displayed. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} Update the description from rep to Live_rep -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8257) Namenode rollingUpgrade option is incorrect in document
[ https://issues.apache.org/jira/browse/HDFS-8257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533888#comment-14533888 ] Hudson commented on HDFS-8257: -- SUCCESS: Integrated in Hadoop-trunk-Commit #7768 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7768/]) HDFS-8257. Namenode rollingUpgrade option is incorrect in document (Contributed by J.Andreina) (vinayakumarb: rev c7c26a1e4aff0b89016ec838d06ba2b628a6808e) * hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Namenode rollingUpgrade option is incorrect in document --- Key: HDFS-8257 URL: https://issues.apache.org/jira/browse/HDFS-8257 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: J.Andreina Assignee: J.Andreina Fix For: 2.8.0 Attachments: HDFS-8257.1.patch ./hdfs namenode -rollingUpgrade supports rollback|started operations , but it is incorrect in document {noformat} hdfs namenode [-backup] | [-checkpoint] | [-format [-clusterid cid ] [-force] [-nonInteractive] ] | [-upgrade [-clusterid cid] [-renameReservedk-v pairs] ] | [-upgradeOnly [-clusterid cid] [-renameReservedk-v pairs] ] | [-rollback] | [-rollingUpgrade downgrade |rollback ] | {noformat} {noformat} -rollingUpgrade downgrade|rollback|started See Rolling Upgrade document for the detail. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7998) HDFS Federation : Command mentioned to add a NN to existing federated cluster is wrong
[ https://issues.apache.org/jira/browse/HDFS-7998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533908#comment-14533908 ] Ajith S commented on HDFS-7998: --- The test failure is not because of the patch HDFS Federation : Command mentioned to add a NN to existing federated cluster is wrong --- Key: HDFS-7998 URL: https://issues.apache.org/jira/browse/HDFS-7998 Project: Hadoop HDFS Issue Type: Bug Components: documentation Reporter: Ajith S Assignee: Ajith S Priority: Minor Labels: BB2015-05-TBR Attachments: HDFS-7998.patch HDFS Federation documentation http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/Federation.html has the following command to add a namenode to existing cluster $HADOOP_PREFIX_HOME/bin/hdfs dfadmin -refreshNameNode datanode_host_name:datanode_rpc_port this command is incorrect, actual correct command is $HADOOP_PREFIX_HOME/bin/hdfs dfsadmin -refreshNamenodes datanode_host_name:datanode_rpc_port need to update the same in documentation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8174) Update replication count to live rep count in fsck report
[ https://issues.apache.org/jira/browse/HDFS-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uma Maheswara Rao G updated HDFS-8174: -- Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to branch-2 and trunk. Thanks a lot, J.Andreina, Update replication count to live rep count in fsck report - Key: HDFS-8174 URL: https://issues.apache.org/jira/browse/HDFS-8174 Project: Hadoop HDFS Issue Type: Bug Reporter: J.Andreina Assignee: J.Andreina Priority: Minor Labels: BB2015-05-RFC Fix For: 2.8.0 Attachments: HDFS-8174.1.patch When one of the replica is decommissioned , fetching fsck report gives repl count is one less than the total replica information displayed. {noformat} blk_x len=y repl=3 [dn1, dn2, dn3, dn4] {noformat} Update the description from rep to Live_rep -- This message was sent by Atlassian JIRA (v6.3.4#6332)