[jira] [Updated] (HDFS-6085) Improve CacheReplicationMonitor log messages a bit
[ https://issues.apache.org/jira/browse/HDFS-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6085: --- Resolution: Fixed Fix Version/s: 2.4.0 Status: Resolved (was: Patch Available) Improve CacheReplicationMonitor log messages a bit -- Key: HDFS-6085 URL: https://issues.apache.org/jira/browse/HDFS-6085 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-6085.001.patch It would be nice if the CacheReplicationMonitor logs would print out information about blocks when at TRACE level. We also could be a bit more organized about logs and include the directive ID in the log, so that it was clear what each log message referred to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6085) Improve CacheReplicationMonitor log messages a bit
[ https://issues.apache.org/jira/browse/HDFS-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930011#comment-13930011 ] Colin Patrick McCabe commented on HDFS-6085: bq. This looks good. Do you want to try lowering the log level for CacheManager#cacheRepors in this patch as well? Right now it's pretty spammy at INFO, and I imagine it being even worse on a large cluster. I'll roll that into HDFS-6086. bq. Otherwise, +1 pending. Thanks, committed. Improve CacheReplicationMonitor log messages a bit -- Key: HDFS-6085 URL: https://issues.apache.org/jira/browse/HDFS-6085 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-6085.001.patch It would be nice if the CacheReplicationMonitor logs would print out information about blocks when at TRACE level. We also could be a bit more organized about logs and include the directive ID in the log, so that it was clear what each log message referred to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6086) Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached
[ https://issues.apache.org/jira/browse/HDFS-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6086: --- Attachment: HDFS-6086.002.patch * Fix compilation failure due to updating FSDatasetSpi interface * change log level of cache report acknowledgement log message in CacheManager Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached --- Key: HDFS-6086 URL: https://issues.apache.org/jira/browse/HDFS-6086 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6086.001.patch, HDFS-6086.002.patch We need to fix a case where zero-copy or no-checksum reads are not allowed even when the block is cached. The case is when the block is cached before the {{REQUEST_SHORT_CIRCUIT_FDS}} operation begins. In this case, {{DataXceiver}} needs to consult the {{ShortCircuitRegistry}} to see if the block is cached, rather than relying on a callback. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6085) Improve CacheReplicationMonitor log messages a bit
[ https://issues.apache.org/jira/browse/HDFS-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930018#comment-13930018 ] Hudson commented on HDFS-6085: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5304 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5304/]) HDFS-6085. Improve CacheReplicationMonitor log messages a bit (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576194) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java Improve CacheReplicationMonitor log messages a bit -- Key: HDFS-6085 URL: https://issues.apache.org/jira/browse/HDFS-6085 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-6085.001.patch It would be nice if the CacheReplicationMonitor logs would print out information about blocks when at TRACE level. We also could be a bit more organized about logs and include the directive ID in the log, so that it was clear what each log message referred to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6080) Improve NFS gateway performance by making rtmax and wtmax configurable
[ https://issues.apache.org/jira/browse/HDFS-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930040#comment-13930040 ] Hadoop QA commented on HDFS-6080: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633847/HDFS-6080.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-nfs: org.apache.hadoop.fs.TestHdfsNativeCodeLoader {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6369//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6369//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-nfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6369//console This message is automatically generated. Improve NFS gateway performance by making rtmax and wtmax configurable -- Key: HDFS-6080 URL: https://issues.apache.org/jira/browse/HDFS-6080 Project: Hadoop HDFS Issue Type: Improvement Components: nfs, performance Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HDFS-6080.patch, HDFS-6080.patch Right now rtmax and wtmax are hardcoded in RpcProgramNFS3. These dictate the maximum read and write capacity of the server. Therefore, these affect the read and write performance. We ran performance tests with 1mb, 100mb, and 1GB files. We noticed significant performance decline with the size increase when compared to fuse. We realized that the issue was with the hardcoded rtmax size(64k). When we increased the rtmax to 1MB, we got a 10x improvement in performance. NFS reads: +---++---+---+---++--+ | File | Size | Run 1 | Run 2 | Run 3 | Average| Std. Dev.| | testFile100Mb | 104857600 | 23.131158137 | 19.24552955 | 19.793332866 | 20.72334018435 | 1.7172094782219731 | | testFile1Gb | 1073741824 | 219.108776636 | 201.064032255 | 217.433909843 | 212.5355729113 | 8.14037175506561 | | testFile1Mb | 1048576| 0.330546906 | 0.256391808 | 0.28730168 | 0.291413464667 | 0.030412987573361663 | +---++---+---+---++--+ Fuse reads: +---++-+--+--++---+ | File | Size | Run 1 | Run 2| Run 3| Average| Std. Dev. | | testFile100Mb | 104857600 | 2.394459443 | 2.695265191 | 2.50046517 | 2.530063267997 | 0.12457410127142007 | | testFile1Gb | 1073741824 | 25.03324924 | 24.155102554 | 24.901525525 | 24.69662577297 | 0.386672412437576 | | testFile1Mb | 1048576| 0.271615094 | 0.270835986 | 0.271796438 | 0.271415839333 | 0.0004166483951065848 | +---++-+--+--++---+ (NFS read after rtmax = 1MB) +---++--+-+--+-+-+ | File | Size | Run 1| Run 2 | Run 3| Average | Std. Dev.| | testFile100Mb | 104857600 | 3.655261869 | 3.438676067 | 3.557464787 | 3.550467574336 | 0.0885591069882058 | | testFile1Gb | 1073741824 | 34.663612417 | 37.32089122 | 37.997718857 | 36.66074083135 | 1.4389615098060426 | |
[jira] [Updated] (HDFS-5196) Provide more snapshot information in WebUI
[ https://issues.apache.org/jira/browse/HDFS-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shinichi Yamashita updated HDFS-5196: - Attachment: HDFS-5196-2.patch I attach a patch file using only new web UI. Provide more snapshot information in WebUI -- Key: HDFS-5196 URL: https://issues.apache.org/jira/browse/HDFS-5196 Project: Hadoop HDFS Issue Type: Improvement Components: snapshots Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Shinichi Yamashita Priority: Minor Attachments: HDFS-5196-2.patch, HDFS-5196.patch, HDFS-5196.patch, HDFS-5196.patch, snapshot-new-webui.png, snapshottable-directoryList.png, snapshotteddir.png The WebUI should provide more detailed information about snapshots, such as all snapshottable directories and corresponding number of snapshots (suggested in HDFS-4096). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6086) Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached
[ https://issues.apache.org/jira/browse/HDFS-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930106#comment-13930106 ] Hadoop QA commented on HDFS-6086: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633862/HDFS-6086.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6370//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6370//console This message is automatically generated. Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached --- Key: HDFS-6086 URL: https://issues.apache.org/jira/browse/HDFS-6086 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6086.001.patch, HDFS-6086.002.patch We need to fix a case where zero-copy or no-checksum reads are not allowed even when the block is cached. The case is when the block is cached before the {{REQUEST_SHORT_CIRCUIT_FDS}} operation begins. In this case, {{DataXceiver}} needs to consult the {{ShortCircuitRegistry}} to see if the block is cached, rather than relying on a callback. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5638) HDFS implementation of FileContext API for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930186#comment-13930186 ] Vinayakumar B commented on HDFS-5638: - Thanks Chris for review and splitting. HDFS implementation of FileContext API for ACLs. Key: HDFS-5638 URL: https://issues.apache.org/jira/browse/HDFS-5638 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Vinayakumar B Attachments: HDFS-5638.2.patch, HDFS-5638.patch, HDFS-5638.patch, HDFS-5638.patch Add new methods to {{AbstractFileSystem}} and {{FileContext}} for manipulating ACLs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5196) Provide more snapshot information in WebUI
[ https://issues.apache.org/jira/browse/HDFS-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930194#comment-13930194 ] Hadoop QA commented on HDFS-5196: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633870/HDFS-5196-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6371//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6371//console This message is automatically generated. Provide more snapshot information in WebUI -- Key: HDFS-5196 URL: https://issues.apache.org/jira/browse/HDFS-5196 Project: Hadoop HDFS Issue Type: Improvement Components: snapshots Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Shinichi Yamashita Priority: Minor Attachments: HDFS-5196-2.patch, HDFS-5196.patch, HDFS-5196.patch, HDFS-5196.patch, snapshot-new-webui.png, snapshottable-directoryList.png, snapshotteddir.png The WebUI should provide more detailed information about snapshots, such as all snapshottable directories and corresponding number of snapshots (suggested in HDFS-4096). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930225#comment-13930225 ] Hudson commented on HDFS-5535: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #506 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/506/]) Move HDFS-5535 to Release 2.4.0 in CHANGES.txt. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576148) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Umbrella jira for improved HDFS rolling upgrades Key: HDFS-5535 URL: https://issues.apache.org/jira/browse/HDFS-5535 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, ha, hdfs-client, namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Nathan Roberts Assignee: Tsz Wo Nicholas Sze Fix For: 2.4.0 Attachments: HDFSRollingUpgradesHighLevelDesign.pdf, HDFSRollingUpgradesHighLevelDesign.v2.pdf, HDFSRollingUpgradesHighLevelDesign.v3.pdf, h5535_20140219.patch, h5535_20140220-1554.patch, h5535_20140220b.patch, h5535_20140221-2031.patch, h5535_20140224-1931.patch, h5535_20140225-1225.patch, h5535_20140226-1328.patch, h5535_20140226-1911.patch, h5535_20140227-1239.patch, h5535_20140228-1714.patch, h5535_20140304-1138.patch, h5535_20140304-branch-2.patch, h5535_20140310-branch-2.patch, hdfs-5535-test-plan.pdf In order to roll a new HDFS release through a large cluster quickly and safely, a few enhancements are needed in HDFS. An initial High level design document will be attached to this jira, and sub-jiras will itemize the individual tasks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages
[ https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930227#comment-13930227 ] Hudson commented on HDFS-3405: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #506 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/506/]) Move HDFS-3405 to 2.4.0 section in CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576158) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages Key: HDFS-3405 URL: https://issues.apache.org/jira/browse/HDFS-3405 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha Reporter: Aaron T. Myers Assignee: Vinayakumar B Fix For: 2.4.0 Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch As Todd points out in [this comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986], the current scheme for a checkpointing daemon to upload a merged fsimage file to an NN is to issue an HTTP get request to tell the target NN to issue another GET request back to the checkpointing daemon to retrieve the merged fsimage file. There's no fundamental reason the checkpointing daemon can't just use an HTTP POST or PUT to send back the merged fsimage file, rather than the double-GET scheme. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6070) Cleanup use of ReadStatistics in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930231#comment-13930231 ] Hudson commented on HDFS-6070: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #506 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/506/]) HDFS-6070. Cleanup use of ReadStatistics in DFSInputStream. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576047) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java Cleanup use of ReadStatistics in DFSInputStream --- Key: HDFS-6070 URL: https://issues.apache.org/jira/browse/HDFS-6070 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Trivial Fix For: 2.4.0 Attachments: hdfs-6070.patch Trivial little code cleanup related to DFSInputStream#ReadStatistics to use update methods rather than reaching in directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930237#comment-13930237 ] Hudson commented on HDFS-5892: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #506 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/506/]) HDFS-5892. TestDeleteBlockPool fails in branch-2. Contributed by Ted Yu. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576035) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSNNTopology.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java TestDeleteBlockPool fails in branch-2 - Key: HDFS-5892 URL: https://issues.apache.org/jira/browse/HDFS-5892 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor Fix For: 2.4.0 Attachments: HDFS-5892.patch, org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt Running test suite on Linux, I got: {code} testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) Time elapsed: 8.143 sec ERROR! java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6077) running slive with webhdfs on secure HA cluster fails with unkown host exception
[ https://issues.apache.org/jira/browse/HDFS-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930235#comment-13930235 ] Hudson commented on HDFS-6077: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #506 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/506/]) HDFS-6077. running slive with webhdfs on secure HA cluster fails with unkown host exception. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576076) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java running slive with webhdfs on secure HA cluster fails with unkown host exception Key: HDFS-6077 URL: https://issues.apache.org/jira/browse/HDFS-6077 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Arpit Gupta Assignee: Jing Zhao Fix For: 2.4.0 Attachments: HDFS-6077.000.patch SliveTest fails with following. See the comment for more logs. {noformat} SliveTest: Unable to run job due to error: java.lang.IllegalArgumentException: java.net.UnknownHostException: ha-2-secure at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:258) at org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:299) ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6085) Improve CacheReplicationMonitor log messages a bit
[ https://issues.apache.org/jira/browse/HDFS-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930236#comment-13930236 ] Hudson commented on HDFS-6085: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #506 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/506/]) HDFS-6085. Improve CacheReplicationMonitor log messages a bit (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576194) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java Improve CacheReplicationMonitor log messages a bit -- Key: HDFS-6085 URL: https://issues.apache.org/jira/browse/HDFS-6085 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-6085.001.patch It would be nice if the CacheReplicationMonitor logs would print out information about blocks when at TRACE level. We also could be a bit more organized about logs and include the directive ID in the log, so that it was clear what each log message referred to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6055) Change default configuration to limit file name length in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930233#comment-13930233 ] Hudson commented on HDFS-6055: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #506 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/506/]) HDFS-6055. Change default configuration to limit file name length in HDFS. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576095) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestSymlinkHdfs.java Change default configuration to limit file name length in HDFS -- Key: HDFS-6055 URL: https://issues.apache.org/jira/browse/HDFS-6055 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Suresh Srinivas Assignee: Chris Nauroth Fix For: 3.0.0, 2.4.0 Attachments: HDFS-6055.1.patch, HDFS-6055.2.patch Currently configuration dfs.namenode.fs-limits.max-component-length is set to 0. With this HDFS file names have no length limit. However, we see more users run into issues where they copy files from HDFS to another file system and the copy fails due to the file name length being too long. I propose changing the default configuration dfs.namenode.fs-limits.max-component-length to a reasonable value. This will be an incompatible change. However, user who need long file names can override this configuration to turn off length limit. What do folks think? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930348#comment-13930348 ] Hudson commented on HDFS-5535: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1698 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1698/]) Move HDFS-5535 to Release 2.4.0 in CHANGES.txt. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576148) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Umbrella jira for improved HDFS rolling upgrades Key: HDFS-5535 URL: https://issues.apache.org/jira/browse/HDFS-5535 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, ha, hdfs-client, namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Nathan Roberts Assignee: Tsz Wo Nicholas Sze Fix For: 2.4.0 Attachments: HDFSRollingUpgradesHighLevelDesign.pdf, HDFSRollingUpgradesHighLevelDesign.v2.pdf, HDFSRollingUpgradesHighLevelDesign.v3.pdf, h5535_20140219.patch, h5535_20140220-1554.patch, h5535_20140220b.patch, h5535_20140221-2031.patch, h5535_20140224-1931.patch, h5535_20140225-1225.patch, h5535_20140226-1328.patch, h5535_20140226-1911.patch, h5535_20140227-1239.patch, h5535_20140228-1714.patch, h5535_20140304-1138.patch, h5535_20140304-branch-2.patch, h5535_20140310-branch-2.patch, hdfs-5535-test-plan.pdf In order to roll a new HDFS release through a large cluster quickly and safely, a few enhancements are needed in HDFS. An initial High level design document will be attached to this jira, and sub-jiras will itemize the individual tasks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6070) Cleanup use of ReadStatistics in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930355#comment-13930355 ] Hudson commented on HDFS-6070: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1698 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1698/]) HDFS-6070. Cleanup use of ReadStatistics in DFSInputStream. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576047) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java Cleanup use of ReadStatistics in DFSInputStream --- Key: HDFS-6070 URL: https://issues.apache.org/jira/browse/HDFS-6070 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Trivial Fix For: 2.4.0 Attachments: hdfs-6070.patch Trivial little code cleanup related to DFSInputStream#ReadStatistics to use update methods rather than reaching in directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6085) Improve CacheReplicationMonitor log messages a bit
[ https://issues.apache.org/jira/browse/HDFS-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930360#comment-13930360 ] Hudson commented on HDFS-6085: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1698 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1698/]) HDFS-6085. Improve CacheReplicationMonitor log messages a bit (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576194) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java Improve CacheReplicationMonitor log messages a bit -- Key: HDFS-6085 URL: https://issues.apache.org/jira/browse/HDFS-6085 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-6085.001.patch It would be nice if the CacheReplicationMonitor logs would print out information about blocks when at TRACE level. We also could be a bit more organized about logs and include the directive ID in the log, so that it was clear what each log message referred to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6077) running slive with webhdfs on secure HA cluster fails with unkown host exception
[ https://issues.apache.org/jira/browse/HDFS-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930359#comment-13930359 ] Hudson commented on HDFS-6077: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1698 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1698/]) HDFS-6077. running slive with webhdfs on secure HA cluster fails with unkown host exception. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576076) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java running slive with webhdfs on secure HA cluster fails with unkown host exception Key: HDFS-6077 URL: https://issues.apache.org/jira/browse/HDFS-6077 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Arpit Gupta Assignee: Jing Zhao Fix For: 2.4.0 Attachments: HDFS-6077.000.patch SliveTest fails with following. See the comment for more logs. {noformat} SliveTest: Unable to run job due to error: java.lang.IllegalArgumentException: java.net.UnknownHostException: ha-2-secure at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:258) at org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:299) ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6055) Change default configuration to limit file name length in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930357#comment-13930357 ] Hudson commented on HDFS-6055: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1698 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1698/]) HDFS-6055. Change default configuration to limit file name length in HDFS. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576095) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestSymlinkHdfs.java Change default configuration to limit file name length in HDFS -- Key: HDFS-6055 URL: https://issues.apache.org/jira/browse/HDFS-6055 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Suresh Srinivas Assignee: Chris Nauroth Fix For: 3.0.0, 2.4.0 Attachments: HDFS-6055.1.patch, HDFS-6055.2.patch Currently configuration dfs.namenode.fs-limits.max-component-length is set to 0. With this HDFS file names have no length limit. However, we see more users run into issues where they copy files from HDFS to another file system and the copy fails due to the file name length being too long. I propose changing the default configuration dfs.namenode.fs-limits.max-component-length to a reasonable value. This will be an incompatible change. However, user who need long file names can override this configuration to turn off length limit. What do folks think? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930361#comment-13930361 ] Hudson commented on HDFS-5892: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1698 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1698/]) HDFS-5892. TestDeleteBlockPool fails in branch-2. Contributed by Ted Yu. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576035) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSNNTopology.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java TestDeleteBlockPool fails in branch-2 - Key: HDFS-5892 URL: https://issues.apache.org/jira/browse/HDFS-5892 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor Fix For: 2.4.0 Attachments: HDFS-5892.patch, org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt Running test suite on Linux, I got: {code} testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) Time elapsed: 8.143 sec ERROR! java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6087) Unify HDFS write/append/truncate
Guo Ruijing created HDFS-6087: - Summary: Unify HDFS write/append/truncate Key: HDFS-6087 URL: https://issues.apache.org/jira/browse/HDFS-6087 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Guo Ruijing Attachments: HDFS Design Proposal.pdf In existing implementation, HDFS file can be appended and HDFS block can be reopened for append. This design will introduce complexity including lease recovery. If we design HDFS block as immutable, it will be very simple for append truncate. The idea is that HDFS block is immutable if the block is committed to namenode. If the block is not committed to namenode, it is HDFS client’s responsibility to re-added with new block ID. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6087) Unify HDFS write/append/truncate
[ https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guo Ruijing updated HDFS-6087: -- Attachment: HDFS Design Proposal.pdf Unify HDFS write/append/truncate Key: HDFS-6087 URL: https://issues.apache.org/jira/browse/HDFS-6087 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Guo Ruijing Attachments: HDFS Design Proposal.pdf In existing implementation, HDFS file can be appended and HDFS block can be reopened for append. This design will introduce complexity including lease recovery. If we design HDFS block as immutable, it will be very simple for append truncate. The idea is that HDFS block is immutable if the block is committed to namenode. If the block is not committed to namenode, it is HDFS client’s responsibility to re-added with new block ID. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930419#comment-13930419 ] Hudson commented on HDFS-5535: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1723 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1723/]) Move HDFS-5535 to Release 2.4.0 in CHANGES.txt. (szetszwo: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576148) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Umbrella jira for improved HDFS rolling upgrades Key: HDFS-5535 URL: https://issues.apache.org/jira/browse/HDFS-5535 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, ha, hdfs-client, namenode Affects Versions: 3.0.0, 2.2.0 Reporter: Nathan Roberts Assignee: Tsz Wo Nicholas Sze Fix For: 2.4.0 Attachments: HDFSRollingUpgradesHighLevelDesign.pdf, HDFSRollingUpgradesHighLevelDesign.v2.pdf, HDFSRollingUpgradesHighLevelDesign.v3.pdf, h5535_20140219.patch, h5535_20140220-1554.patch, h5535_20140220b.patch, h5535_20140221-2031.patch, h5535_20140224-1931.patch, h5535_20140225-1225.patch, h5535_20140226-1328.patch, h5535_20140226-1911.patch, h5535_20140227-1239.patch, h5535_20140228-1714.patch, h5535_20140304-1138.patch, h5535_20140304-branch-2.patch, h5535_20140310-branch-2.patch, hdfs-5535-test-plan.pdf In order to roll a new HDFS release through a large cluster quickly and safely, a few enhancements are needed in HDFS. An initial High level design document will be attached to this jira, and sub-jiras will itemize the individual tasks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages
[ https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930421#comment-13930421 ] Hudson commented on HDFS-3405: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1723 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1723/]) Move HDFS-3405 to 2.4.0 section in CHANGES.txt (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576158) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages Key: HDFS-3405 URL: https://issues.apache.org/jira/browse/HDFS-3405 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha Reporter: Aaron T. Myers Assignee: Vinayakumar B Fix For: 2.4.0 Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch As Todd points out in [this comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986], the current scheme for a checkpointing daemon to upload a merged fsimage file to an NN is to issue an HTTP get request to tell the target NN to issue another GET request back to the checkpointing daemon to retrieve the merged fsimage file. There's no fundamental reason the checkpointing daemon can't just use an HTTP POST or PUT to send back the merged fsimage file, rather than the double-GET scheme. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6055) Change default configuration to limit file name length in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930427#comment-13930427 ] Hudson commented on HDFS-6055: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1723 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1723/]) HDFS-6055. Change default configuration to limit file name length in HDFS. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576095) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestSymlinkHdfs.java Change default configuration to limit file name length in HDFS -- Key: HDFS-6055 URL: https://issues.apache.org/jira/browse/HDFS-6055 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.0.0, 2.4.0 Reporter: Suresh Srinivas Assignee: Chris Nauroth Fix For: 3.0.0, 2.4.0 Attachments: HDFS-6055.1.patch, HDFS-6055.2.patch Currently configuration dfs.namenode.fs-limits.max-component-length is set to 0. With this HDFS file names have no length limit. However, we see more users run into issues where they copy files from HDFS to another file system and the copy fails due to the file name length being too long. I propose changing the default configuration dfs.namenode.fs-limits.max-component-length to a reasonable value. This will be an incompatible change. However, user who need long file names can override this configuration to turn off length limit. What do folks think? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6077) running slive with webhdfs on secure HA cluster fails with unkown host exception
[ https://issues.apache.org/jira/browse/HDFS-6077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930429#comment-13930429 ] Hudson commented on HDFS-6077: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1723 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1723/]) HDFS-6077. running slive with webhdfs on secure HA cluster fails with unkown host exception. Contributed by Jing Zhao. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576076) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java running slive with webhdfs on secure HA cluster fails with unkown host exception Key: HDFS-6077 URL: https://issues.apache.org/jira/browse/HDFS-6077 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Arpit Gupta Assignee: Jing Zhao Fix For: 2.4.0 Attachments: HDFS-6077.000.patch SliveTest fails with following. See the comment for more logs. {noformat} SliveTest: Unable to run job due to error: java.lang.IllegalArgumentException: java.net.UnknownHostException: ha-2-secure at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.security.SecurityUtil.buildDTServiceName(SecurityUtil.java:258) at org.apache.hadoop.fs.FileSystem.getCanonicalServiceName(FileSystem.java:299) ... {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5892) TestDeleteBlockPool fails in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930431#comment-13930431 ] Hudson commented on HDFS-5892: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1723 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1723/]) HDFS-5892. TestDeleteBlockPool fails in branch-2. Contributed by Ted Yu. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576035) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSNNTopology.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDeleteBlockPool.java TestDeleteBlockPool fails in branch-2 - Key: HDFS-5892 URL: https://issues.apache.org/jira/browse/HDFS-5892 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor Fix For: 2.4.0 Attachments: HDFS-5892.patch, org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool-output.txt Running test suite on Linux, I got: {code} testDeleteBlockPool(org.apache.hadoop.hdfs.server.datanode.TestDeleteBlockPool) Time elapsed: 8.143 sec ERROR! java.io.IOException: All datanodes 127.0.0.1:43721 are bad. Aborting... at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6085) Improve CacheReplicationMonitor log messages a bit
[ https://issues.apache.org/jira/browse/HDFS-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930430#comment-13930430 ] Hudson commented on HDFS-6085: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1723 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1723/]) HDFS-6085. Improve CacheReplicationMonitor log messages a bit (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576194) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java Improve CacheReplicationMonitor log messages a bit -- Key: HDFS-6085 URL: https://issues.apache.org/jira/browse/HDFS-6085 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-6085.001.patch It would be nice if the CacheReplicationMonitor logs would print out information about blocks when at TRACE level. We also could be a bit more organized about logs and include the directive ID in the log, so that it was clear what each log message referred to. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6070) Cleanup use of ReadStatistics in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930425#comment-13930425 ] Hudson commented on HDFS-6070: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1723 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1723/]) HDFS-6070. Cleanup use of ReadStatistics in DFSInputStream. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576047) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java Cleanup use of ReadStatistics in DFSInputStream --- Key: HDFS-6070 URL: https://issues.apache.org/jira/browse/HDFS-6070 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Trivial Fix For: 2.4.0 Attachments: hdfs-6070.patch Trivial little code cleanup related to DFSInputStream#ReadStatistics to use update methods rather than reaching in directly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5638) HDFS implementation of FileContext API for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930535#comment-13930535 ] Hudson commented on HDFS-5638: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5305 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5305/]) HDFS-5638. HDFS implementation of FileContext API for ACLs. Contributed by Vinayakumar B. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576405) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/fs/Hdfs.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileContextAcl.java HDFS implementation of FileContext API for ACLs. Key: HDFS-5638 URL: https://issues.apache.org/jira/browse/HDFS-5638 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Vinayakumar B Attachments: HDFS-5638.2.patch, HDFS-5638.patch, HDFS-5638.patch, HDFS-5638.patch Add new methods to {{AbstractFileSystem}} and {{FileContext}} for manipulating ACLs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5638) HDFS implementation of FileContext API for ACLs.
[ https://issues.apache.org/jira/browse/HDFS-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5638: Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Status: Resolved (was: Patch Available) I committed this to trunk, branch-2 and branch-2.4. Thanks again for the contributions, Vinay! HDFS implementation of FileContext API for ACLs. Key: HDFS-5638 URL: https://issues.apache.org/jira/browse/HDFS-5638 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Vinayakumar B Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5638.2.patch, HDFS-5638.patch, HDFS-5638.patch, HDFS-5638.patch Add new methods to {{AbstractFileSystem}} and {{FileContext}} for manipulating ACLs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6088) Add configurable maximum block count for datanode
Kihwal Lee created HDFS-6088: Summary: Add configurable maximum block count for datanode Key: HDFS-6088 URL: https://issues.apache.org/jira/browse/HDFS-6088 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Currently datanode resources are protected by the free space check and the balancer. But datanodes can run out of memory simply storing too many blocks. If the sizes of blocks are small, datanodes will appear to have plenty of space to put more blocks. I propose adding a configurable max block count to datanode. Since datanodes can have different heap configurations, it will make sense to make it datanode-level, rather than something enforced by namenode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6088) Add configurable maximum block count for datanode
[ https://issues.apache.org/jira/browse/HDFS-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930555#comment-13930555 ] Kihwal Lee commented on HDFS-6088: -- It will be nice though if NN knows what's going on, so that block placement policy can avoid picking the full nodes. DN could include its free block count in heartbeat. Add configurable maximum block count for datanode - Key: HDFS-6088 URL: https://issues.apache.org/jira/browse/HDFS-6088 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Currently datanode resources are protected by the free space check and the balancer. But datanodes can run out of memory simply storing too many blocks. If the sizes of blocks are small, datanodes will appear to have plenty of space to put more blocks. I propose adding a configurable max block count to datanode. Since datanodes can have different heap configurations, it will make sense to make it datanode-level, rather than something enforced by namenode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930614#comment-13930614 ] Yongjun Zhang commented on HDFS-5944: - Thanks [~zhaoyunjiong] for reporting the issue and the fix. Hi [~brandonli], thanks for reviewing and committing the fix. It's said to be fixed in 2.4.0 but I don't see it in branch-2.4. Would you please check? Thanks. LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint Key: HDFS-5944 URL: https://issues.apache.org/jira/browse/HDFS-5944 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Fix For: 1.3.0, 2.4.0 Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, HDFS-5944.test.txt, HDFS-5944.trunk.patch In our cluster, we encountered error like this: java.io.IOException: saveLeases found path /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949) What happened: Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write. And Client A continue refresh it's lease. Client B deleted /XXX/20140206/04_30/ Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log Then secondaryNameNode try to do checkpoint and failed due to failed to delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/. The reason is a bug in findLeaseWithPrefixPath: int srclen = prefix.length(); if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) { entries.put(entry.getKey(), entry.getValue()); } Here when prefix is /XXX/20140206/04_30/, and p is /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'. The fix is simple, I'll upload patch later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6007) Update documentation about short-circuit local reads
[ https://issues.apache.org/jira/browse/HDFS-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-6007: --- Attachment: HDFS-6007-3.patch added description about shared memory segments. Update documentation about short-circuit local reads Key: HDFS-6007 URL: https://issues.apache.org/jira/browse/HDFS-6007 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Priority: Minor Attachments: HDFS-6007-0.patch, HDFS-6007-1.patch, HDFS-6007-2.patch, HDFS-6007-3.patch updating the contents of HDFS SHort-Circuit Local Reads based on the changes in HDFS-4538 and HDFS-4953. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6035) TestCacheDirectives#testCacheManagerRestart is failing on branch-2
[ https://issues.apache.org/jira/browse/HDFS-6035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930720#comment-13930720 ] Mit Desai commented on HDFS-6035: - [~sathish.gurram], Can you let me know what branch are you testing this on? TestCacheDirectives#testCacheManagerRestart is failing on branch-2 -- Key: HDFS-6035 URL: https://issues.apache.org/jira/browse/HDFS-6035 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: sathish Attachments: HDFS-6035-0001.patch {noformat} java.io.IOException: Inconsistent checkpoint fields. LV = -51 namespaceID = 1641397469 cTime = 0 ; clusterId = testClusterID ; blockpoolId = BP-423574854-x.x.x.x-1393478669835. Expecting respectively: -51; 2; 0; testClusterID; BP-2051361571-x.x.x.x-1393478572877. at org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:133) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:526) at org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives.testCacheManagerRestart(TestCacheDirectives.java:582) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-5274: --- Attachment: 3node_put_200mb.png Add Tracing to HDFS --- Key: HDFS-5274 URL: https://issues.apache.org/jira/browse/HDFS-5274 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 2.1.1-beta Reporter: Elliott Clark Assignee: Elliott Clark Attachments: 3node_get_200mb.png, 3node_put_200mb.png, 3node_put_200mb.png, HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-10.patch, HDFS-5274-11.txt, HDFS-5274-12.patch, HDFS-5274-13.patch, HDFS-5274-2.patch, HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, HDFS-5274-7.patch, HDFS-5274-8.patch, HDFS-5274-8.patch, HDFS-5274-9.patch, Zipkin Trace a06e941b0172ec73.png, Zipkin Trace d0f0d66b8a258a69.png, ss-5274v8-get.png, ss-5274v8-put.png Since Google's Dapper paper has shown the benefits of tracing for a large distributed system, it seems like a good time to add tracing to HDFS. HBase has added tracing using HTrace. I propose that the same can be done within HDFS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5274) Add Tracing to HDFS
[ https://issues.apache.org/jira/browse/HDFS-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930756#comment-13930756 ] Hadoop QA commented on HDFS-5274: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633965/3node_put_200mb.png against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6373//console This message is automatically generated. Add Tracing to HDFS --- Key: HDFS-5274 URL: https://issues.apache.org/jira/browse/HDFS-5274 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Affects Versions: 2.1.1-beta Reporter: Elliott Clark Assignee: Elliott Clark Attachments: 3node_get_200mb.png, 3node_put_200mb.png, 3node_put_200mb.png, HDFS-5274-0.patch, HDFS-5274-1.patch, HDFS-5274-10.patch, HDFS-5274-11.txt, HDFS-5274-12.patch, HDFS-5274-13.patch, HDFS-5274-2.patch, HDFS-5274-3.patch, HDFS-5274-4.patch, HDFS-5274-5.patch, HDFS-5274-6.patch, HDFS-5274-7.patch, HDFS-5274-8.patch, HDFS-5274-8.patch, HDFS-5274-9.patch, Zipkin Trace a06e941b0172ec73.png, Zipkin Trace d0f0d66b8a258a69.png, ss-5274v8-get.png, ss-5274v8-put.png Since Google's Dapper paper has shown the benefits of tracing for a large distributed system, it seems like a good time to add tracing to HDFS. HBase has added tracing using HTrace. I propose that the same can be done within HDFS. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-4461: - Attachment: HDFS-4461.branch-0.23.patch I thought we can wait till 2.x, but some 0.23 users are creating a lot of small files (i.e. small blocks) and DNs are running out of memory when DirectoryScanner runs. The peak heap usage can be almost 2x or even 3x of the base usage, if one dir scan garbage survives until the next scan. The patch is a straight back-port of the trunk version. The difference comes from the fact that a source file got split into multiple files in branch-2/trunk. Other than that the core change is exactly the same. DirectoryScanner: volume path prefix takes up memory for every block that is scanned - Key: HDFS-4461 URL: https://issues.apache.org/jira/browse/HDFS-4461 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.3-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Fix For: 2.1.0-beta Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, HDFS-4461.004.patch, HDFS-4461.branch-0.23.patch, HDFS-4661.006.patch, memory-analysis.png In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. This object contains two File objects-- one for the metadata file, and one for the block file. Since those File objects contain full paths, users who pick a lengthly path for their volume roots will end up using an extra N_blocks * path_prefix bytes per block scanned. We also don't really need to store File objects-- storing strings and then creating File objects as needed would be cheaper. This would be a nice efficiency improvement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-4461: - Fix Version/s: 0.23.11 DirectoryScanner: volume path prefix takes up memory for every block that is scanned - Key: HDFS-4461 URL: https://issues.apache.org/jira/browse/HDFS-4461 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.3-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Fix For: 2.1.0-beta, 0.23.11 Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, HDFS-4461.004.patch, HDFS-4461.branch-0.23.patch, HDFS-4661.006.patch, memory-analysis.png In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. This object contains two File objects-- one for the metadata file, and one for the block file. Since those File objects contain full paths, users who pick a lengthly path for their volume roots will end up using an extra N_blocks * path_prefix bytes per block scanned. We also don't really need to store File objects-- storing strings and then creating File objects as needed would be cheaper. This would be a nice efficiency improvement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930815#comment-13930815 ] Colin Patrick McCabe commented on HDFS-4461: +1 for the backport. Note that I have not tested it, just reviewed it DirectoryScanner: volume path prefix takes up memory for every block that is scanned - Key: HDFS-4461 URL: https://issues.apache.org/jira/browse/HDFS-4461 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.3-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Fix For: 2.1.0-beta, 0.23.11 Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, HDFS-4461.004.patch, HDFS-4461.branch-0.23.patch, HDFS-4661.006.patch, memory-analysis.png In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. This object contains two File objects-- one for the metadata file, and one for the block file. Since those File objects contain full paths, users who pick a lengthly path for their volume roots will end up using an extra N_blocks * path_prefix bytes per block scanned. We also don't really need to store File objects-- storing strings and then creating File objects as needed would be cheaper. This would be a nice efficiency improvement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6007) Update documentation about short-circuit local reads
[ https://issues.apache.org/jira/browse/HDFS-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930877#comment-13930877 ] Hadoop QA commented on HDFS-6007: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633949/HDFS-6007-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6372//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6372//console This message is automatically generated. Update documentation about short-circuit local reads Key: HDFS-6007 URL: https://issues.apache.org/jira/browse/HDFS-6007 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Priority: Minor Attachments: HDFS-6007-0.patch, HDFS-6007-1.patch, HDFS-6007-2.patch, HDFS-6007-3.patch updating the contents of HDFS SHort-Circuit Local Reads based on the changes in HDFS-4538 and HDFS-4953. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930951#comment-13930951 ] Brandon Li commented on HDFS-5944: -- I forgot to append the jira number when doing the back porting. Usually I don't forget. But, sorry for this one. Here are the log entries in branch-2 and 2.4. For branch-2: r1570372 | brandonli | 2014-02-20 14:32:49 -0800 (Thu, 20 Feb 2014) | 1 line Merging change r1570366 from trunk For branch-2.4: r1570377 | brandonli | 2014-02-20 14:40:00 -0800 (Thu, 20 Feb 2014) | 1 line Merging change r1570372 from branch-2 LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint Key: HDFS-5944 URL: https://issues.apache.org/jira/browse/HDFS-5944 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Fix For: 1.3.0, 2.4.0 Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, HDFS-5944.test.txt, HDFS-5944.trunk.patch In our cluster, we encountered error like this: java.io.IOException: saveLeases found path /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949) What happened: Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write. And Client A continue refresh it's lease. Client B deleted /XXX/20140206/04_30/ Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log Then secondaryNameNode try to do checkpoint and failed due to failed to delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/. The reason is a bug in findLeaseWithPrefixPath: int srclen = prefix.length(); if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) { entries.put(entry.getKey(), entry.getValue()); } Here when prefix is /XXX/20140206/04_30/, and p is /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'. The fix is simple, I'll upload patch later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
Arpit Gupta created HDFS-6089: - Summary: Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended Key: HDFS-6089 URL: https://issues.apache.org/jira/browse/HDFS-6089 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out connection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
[ https://issues.apache.org/jira/browse/HDFS-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930954#comment-13930954 ] Arpit Gupta commented on HDFS-6089: --- Here is the console log {code} sudo su - -c /usr/bin/hdfs haadmin -getServiceState nn1 hdfs active exit code = 0 sudo su - -c /usr/bin/hdfs haadmin -getServiceState nn2 hdfs standby exit code = 0 ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null hostname sudo su - -c \cat /grid/0/var/run/hadoop/hdfs/hadoop-hdfs-namenode.pid | xargs kill -19\ hdfs sudo su - -c /usr/bin/hdfs haadmin -getServiceState nn1 hdfs Operation failed: Call From host1/ip to host1:8020 failed on socket timeout exception: java.net.SocketTimeoutException: 2 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=host1/ip:35192 remote=host1/ip:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout exit code = 255 sudo su - -c /usr/bin/hdfs haadmin -getServiceState nn2 hdfs Operation failed: Call From host2/ip to host2:8020 failed on socket timeout exception: java.net.SocketTimeoutException: 2 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=host2/ip:37640 remote=host2/68.142.247.217:8020]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout exit code = 255 {code} Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended Key: HDFS-6089 URL: https://issues.apache.org/jira/browse/HDFS-6089 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out connection. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
[ https://issues.apache.org/jira/browse/HDFS-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Gupta updated HDFS-6089: -- Description: The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out exception. was: The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out connection. Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended Key: HDFS-6089 URL: https://issues.apache.org/jira/browse/HDFS-6089 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6090) Use MiniDFSCluster.Builder instead of deprecated constructors
Akira AJISAKA created HDFS-6090: --- Summary: Use MiniDFSCluster.Builder instead of deprecated constructors Key: HDFS-6090 URL: https://issues.apache.org/jira/browse/HDFS-6090 Project: Hadoop HDFS Issue Type: Improvement Components: test Reporter: Akira AJISAKA Priority: Minor Some test classes are using deprecated constructors such as {{MiniDFSCluster(Configuration, int, boolean, String[], String[])}} for building a MiniDFSCluster. These classes should use {{MiniDFSCluster.Builder}} to reduce javac warnings and improve code readability. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930966#comment-13930966 ] Kihwal Lee commented on HDFS-4461: -- Thanks, Colin. I've checked it into branch-0.23. DirectoryScanner: volume path prefix takes up memory for every block that is scanned - Key: HDFS-4461 URL: https://issues.apache.org/jira/browse/HDFS-4461 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.3-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor Fix For: 2.1.0-beta, 0.23.11 Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, HDFS-4461.004.patch, HDFS-4461.branch-0.23.patch, HDFS-4661.006.patch, memory-analysis.png In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. This object contains two File objects-- one for the metadata file, and one for the block file. Since those File objects contain full paths, users who pick a lengthly path for their volume roots will end up using an extra N_blocks * path_prefix bytes per block scanned. We also don't really need to store File objects-- storing strings and then creating File objects as needed would be cheaper. This would be a nice efficiency improvement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5944) LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint
[ https://issues.apache.org/jira/browse/HDFS-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930976#comment-13930976 ] Yongjun Zhang commented on HDFS-5944: - Hi Brandon, many thanks for the clarification. LeaseManager:findLeaseWithPrefixPath can't handle path like /a/b/ right and cause SecondaryNameNode failed do checkpoint Key: HDFS-5944 URL: https://issues.apache.org/jira/browse/HDFS-5944 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 1.2.0, 2.2.0 Reporter: zhaoyunjiong Assignee: zhaoyunjiong Fix For: 1.3.0, 2.4.0 Attachments: HDFS-5944-branch-1.2.patch, HDFS-5944.patch, HDFS-5944.test.txt, HDFS-5944.trunk.patch In our cluster, we encountered error like this: java.io.IOException: saveLeases found path /XXX/20140206/04_30/_SUCCESS.slc.log but is not under construction. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:6217) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:607) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveCurrent(FSImage.java:1004) at org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:949) What happened: Client A open file /XXX/20140206/04_30/_SUCCESS.slc.log for write. And Client A continue refresh it's lease. Client B deleted /XXX/20140206/04_30/ Client C open file /XXX/20140206/04_30/_SUCCESS.slc.log for write Client C closed the file /XXX/20140206/04_30/_SUCCESS.slc.log Then secondaryNameNode try to do checkpoint and failed due to failed to delete lease hold by Client A when Client B deleted /XXX/20140206/04_30/. The reason is a bug in findLeaseWithPrefixPath: int srclen = prefix.length(); if (p.length() == srclen || p.charAt(srclen) == Path.SEPARATOR_CHAR) { entries.put(entry.getKey(), entry.getValue()); } Here when prefix is /XXX/20140206/04_30/, and p is /XXX/20140206/04_30/_SUCCESS.slc.log, p.charAt(srcllen) is '_'. The fix is simple, I'll upload patch later. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6072) Clean up dead code of FSImage
[ https://issues.apache.org/jira/browse/HDFS-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13930973#comment-13930973 ] Jing Zhao commented on HDFS-6072: - +1 for the new patch. Thanks for the cleaning, Haohui! Clean up dead code of FSImage - Key: HDFS-6072 URL: https://issues.apache.org/jira/browse/HDFS-6072 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-6072.000.patch, HDFS-6072.001.patch, HDFS-6072.002.patch After HDFS-5698 HDFS store FSImage in protobuf format. The old code of saving the FSImage is now dead, which should be removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6091) dfs.journalnode.edits.dir should accept URI
Allen Wittenauer created HDFS-6091: -- Summary: dfs.journalnode.edits.dir should accept URI Key: HDFS-6091 URL: https://issues.apache.org/jira/browse/HDFS-6091 Project: Hadoop HDFS Issue Type: Bug Components: journal-node Affects Versions: 2.2.0 Reporter: Allen Wittenauer Priority: Minor Using a URI in dfs.journalnode.edits.dir (such as file:///foo) throws a Journal dir 'file:/foo' should be an absolute path'. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6072) Clean up dead code of FSImage
[ https://issues.apache.org/jira/browse/HDFS-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6072: - Resolution: Fixed Fix Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk, branch-2, and branch-2.4. Thanks [~ajisakaa] and [~jingzhao] for the review. Clean up dead code of FSImage - Key: HDFS-6072 URL: https://issues.apache.org/jira/browse/HDFS-6072 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.4.0 Attachments: HDFS-6072.000.patch, HDFS-6072.001.patch, HDFS-6072.002.patch After HDFS-5698 HDFS store FSImage in protobuf format. The old code of saving the FSImage is now dead, which should be removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6084) Namenode UI - Hadoop logo link shouldn't go to hadoop homepage
[ https://issues.apache.org/jira/browse/HDFS-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931026#comment-13931026 ] Haohui Mai commented on HDFS-6084: -- Some users have expressed concerns on potential security issues on the external links. They are concerned that when a user clicks on the external link, the referrer header in HTTP requests might leak sensitive information (e.g., the path of a directory). I guess that we can leave the text here but remove all external links. [~tthompso], do you have any suggestions? Namenode UI - Hadoop logo link shouldn't go to hadoop homepage Key: HDFS-6084 URL: https://issues.apache.org/jira/browse/HDFS-6084 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor Attachments: HDFS-6084.1.patch.txt When clicking the Hadoop title the user is taken to the Hadoop homepage, which feels unintuitive. There's already a link at the bottom where it's always been, which is reasonable. I think that the title should go to the main Namenode page, #tab-overview. Suggestions? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6072) Clean up dead code of FSImage
[ https://issues.apache.org/jira/browse/HDFS-6072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931031#comment-13931031 ] Hudson commented on HDFS-6072: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5306 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5306/]) HDFS-6072. Clean up dead code of FSImage. Contributed by Haohui Mai. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576513) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/security/token/delegation/DelegationTokenSecretManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageSerialization.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/DirectoryWithSnapshotFeature.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileDiff.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/Snapshot.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotFSImageFormat.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java Clean up dead code of FSImage - Key: HDFS-6072 URL: https://issues.apache.org/jira/browse/HDFS-6072 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.4.0 Attachments: HDFS-6072.000.patch, HDFS-6072.001.patch, HDFS-6072.002.patch After HDFS-5698 HDFS store FSImage in protobuf format. The old code of saving the FSImage is now dead, which should be removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6080) Improve NFS gateway performance by making rtmax and wtmax configurable
[ https://issues.apache.org/jira/browse/HDFS-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931033#comment-13931033 ] Brandon Li commented on HDFS-6080: -- [~ashahab], given your test result and hdfs file size is in MB size usually, I think we may want to keep 1MB as default. The user can always change it to a smaller size when needed. What do you think? Improve NFS gateway performance by making rtmax and wtmax configurable -- Key: HDFS-6080 URL: https://issues.apache.org/jira/browse/HDFS-6080 Project: Hadoop HDFS Issue Type: Improvement Components: nfs, performance Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HDFS-6080.patch, HDFS-6080.patch Right now rtmax and wtmax are hardcoded in RpcProgramNFS3. These dictate the maximum read and write capacity of the server. Therefore, these affect the read and write performance. We ran performance tests with 1mb, 100mb, and 1GB files. We noticed significant performance decline with the size increase when compared to fuse. We realized that the issue was with the hardcoded rtmax size(64k). When we increased the rtmax to 1MB, we got a 10x improvement in performance. NFS reads: +---++---+---+---++--+ | File | Size | Run 1 | Run 2 | Run 3 | Average| Std. Dev.| | testFile100Mb | 104857600 | 23.131158137 | 19.24552955 | 19.793332866 | 20.72334018435 | 1.7172094782219731 | | testFile1Gb | 1073741824 | 219.108776636 | 201.064032255 | 217.433909843 | 212.5355729113 | 8.14037175506561 | | testFile1Mb | 1048576| 0.330546906 | 0.256391808 | 0.28730168 | 0.291413464667 | 0.030412987573361663 | +---++---+---+---++--+ Fuse reads: +---++-+--+--++---+ | File | Size | Run 1 | Run 2| Run 3| Average| Std. Dev. | | testFile100Mb | 104857600 | 2.394459443 | 2.695265191 | 2.50046517 | 2.530063267997 | 0.12457410127142007 | | testFile1Gb | 1073741824 | 25.03324924 | 24.155102554 | 24.901525525 | 24.69662577297 | 0.386672412437576 | | testFile1Mb | 1048576| 0.271615094 | 0.270835986 | 0.271796438 | 0.271415839333 | 0.0004166483951065848 | +---++-+--+--++---+ (NFS read after rtmax = 1MB) +---++--+-+--+-+-+ | File | Size | Run 1| Run 2 | Run 3| Average | Std. Dev.| | testFile100Mb | 104857600 | 3.655261869 | 3.438676067 | 3.557464787 | 3.550467574336 | 0.0885591069882058 | | testFile1Gb | 1073741824 | 34.663612417 | 37.32089122 | 37.997718857 | 36.66074083135 | 1.4389615098060426 | | testFile1Mb | 1048576| 0.115602858 | 0.106826253 | 0.125229976 | 0.1158863623334 | 0.007515962395481867 | +---++--+-+--+-+-+ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6086) Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached
[ https://issues.apache.org/jira/browse/HDFS-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931036#comment-13931036 ] Andrew Wang commented on HDFS-6086: --- +1 looks good, thanks Colin Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached --- Key: HDFS-6086 URL: https://issues.apache.org/jira/browse/HDFS-6086 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6086.001.patch, HDFS-6086.002.patch We need to fix a case where zero-copy or no-checksum reads are not allowed even when the block is cached. The case is when the block is cached before the {{REQUEST_SHORT_CIRCUIT_FDS}} operation begins. In this case, {{DataXceiver}} needs to consult the {{ShortCircuitRegistry}} to see if the block is cached, rather than relying on a callback. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5196) Provide more snapshot information in WebUI
[ https://issues.apache.org/jira/browse/HDFS-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931042#comment-13931042 ] Haohui Mai commented on HDFS-5196: -- The patch mostly looks good. {code} diff --git hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeHttpServer.java hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeHttpServer.java index 43952be..cb0bf79 100644 --- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeHttpServer.java +++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeHttpServer.java @@ -243,6 +243,8 @@ private static void setupServlets(HttpServer2 httpServer, Configuration conf) { FileChecksumServlets.RedirectServlet.class, false); httpServer.addInternalServlet(contentSummary, /contentSummary/*, ContentSummaryServlet.class, false); +httpServer.addInternalServlet(snapshot, +SnapshotInfoServlet.PATH_SPEC, SnapshotInfoServlet.class, false); } {code} It might be more appropriate to put the information in jmx. For example, in {{FSNamesystemState}}. {code} + public Object getSnapshottableDirs() throws IOException { +//MapString, MapString,Object info = new HashMapString, MapString, Object(); +ListMap info = new ArrayListMap(); +SnapshottableDirectoryStatus[] stats = getSnapshottableDirListing(); +if (stats == null) { + return {}; ... {code} The code should return a {{MXBean}} instead of JSON string. Otherwise it requires hacks and workarounds in the client side to parse the JSON. Please see HDFS-6013 if you have more questions. {code} +alert(err); ... - })).error(ajax_error_handler); + }), + function (url, jqxhr, text, err) { +show_err_msg('pFailed to retrieve data from ' + url + ', cause: ' + err + '/p'); + }); {code} These changes look unrelated. Can you please remove them from this patch? Provide more snapshot information in WebUI -- Key: HDFS-5196 URL: https://issues.apache.org/jira/browse/HDFS-5196 Project: Hadoop HDFS Issue Type: Improvement Components: snapshots Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Shinichi Yamashita Priority: Minor Attachments: HDFS-5196-2.patch, HDFS-5196.patch, HDFS-5196.patch, HDFS-5196.patch, snapshot-new-webui.png, snapshottable-directoryList.png, snapshotteddir.png The WebUI should provide more detailed information about snapshots, such as all snapshottable directories and corresponding number of snapshots (suggested in HDFS-4096). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
[ https://issues.apache.org/jira/browse/HDFS-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931045#comment-13931045 ] Jing Zhao commented on HDFS-6089: - Checked the log with Arpit. Looks like the issue is like this: 1. After NN1 got suspended, NN2 started the transition. It first tried to stop the editlog tailer thread. 2. The editlog tailer thread happened to trigger NN1 to roll its editlog right before the transition, and this rpc call got stuck since NN1 was suspended. 3. It took a relatively long time (1min) for the rollEditlog rpc call to receive the connection reset exception. 4. During this time, NN2 waited for the tailer thread to die, and the fsnamesystem lock was held by the stopStandbyService call. 5. haadmin's getServiceState request could not get response (since the lock was held by the transition thread in NN2) and timeout (its default socket timeout is 20s). In summary, it is possible that the rollEditlog rpc call from the standby NN to the active NN in the editlog tailer thread may delay the NN failover. Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended Key: HDFS-6089 URL: https://issues.apache.org/jira/browse/HDFS-6089 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
[ https://issues.apache.org/jira/browse/HDFS-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931048#comment-13931048 ] Jing Zhao commented on HDFS-6089: - Since in active NN we already have a NameNodeEditLogRoller thread triggering the editlog roll, I guess the standby NN doesn't need to trigger the active namenode to roll the editlog. Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended Key: HDFS-6089 URL: https://issues.apache.org/jira/browse/HDFS-6089 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6092) DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port
Ted Yu created HDFS-6092: Summary: DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port Key: HDFS-6092 URL: https://issues.apache.org/jira/browse/HDFS-6092 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Ted Yu I discovered this when working on HBASE-10717 Here is sample code to reproduce the problem: {code} Path desPath = new Path(hdfs://127.0.0.1/); FileSystem desFs = desPath.getFileSystem(conf); String s = desFs.getCanonicalServiceName(); URI uri = desFs.getUri(); {code} Canonical name string contains the default port - 8020 But uri doesn't contain port. This would result in the following exception: {code} testIsSameHdfs(org.apache.hadoop.hbase.util.TestFSHDFSUtils) Time elapsed: 0.001 sec ERROR! java.lang.IllegalArgumentException: port out of range:-1 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) at java.net.InetSocketAddress.init(InetSocketAddress.java:224) at org.apache.hadoop.hbase.util.FSHDFSUtils.getNNAddresses(FSHDFSUtils.java:88) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6092) DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port
[ https://issues.apache.org/jira/browse/HDFS-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-6092: - Description: I discovered this when working on HBASE-10717 Here is sample code to reproduce the problem: {code} Path desPath = new Path(hdfs://127.0.0.1/); FileSystem desFs = desPath.getFileSystem(conf); String s = desFs.getCanonicalServiceName(); URI uri = desFs.getUri(); {code} Canonical name string contains the default port - 8020 But uri doesn't contain port. This would result in the following exception: {code} testIsSameHdfs(org.apache.hadoop.hbase.util.TestFSHDFSUtils) Time elapsed: 0.001 sec ERROR! java.lang.IllegalArgumentException: port out of range:-1 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) at java.net.InetSocketAddress.init(InetSocketAddress.java:224) at org.apache.hadoop.hbase.util.FSHDFSUtils.getNNAddresses(FSHDFSUtils.java:88) {code} Thanks to Brando Li who helped debug this. was: I discovered this when working on HBASE-10717 Here is sample code to reproduce the problem: {code} Path desPath = new Path(hdfs://127.0.0.1/); FileSystem desFs = desPath.getFileSystem(conf); String s = desFs.getCanonicalServiceName(); URI uri = desFs.getUri(); {code} Canonical name string contains the default port - 8020 But uri doesn't contain port. This would result in the following exception: {code} testIsSameHdfs(org.apache.hadoop.hbase.util.TestFSHDFSUtils) Time elapsed: 0.001 sec ERROR! java.lang.IllegalArgumentException: port out of range:-1 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) at java.net.InetSocketAddress.init(InetSocketAddress.java:224) at org.apache.hadoop.hbase.util.FSHDFSUtils.getNNAddresses(FSHDFSUtils.java:88) {code} DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port -- Key: HDFS-6092 URL: https://issues.apache.org/jira/browse/HDFS-6092 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Ted Yu I discovered this when working on HBASE-10717 Here is sample code to reproduce the problem: {code} Path desPath = new Path(hdfs://127.0.0.1/); FileSystem desFs = desPath.getFileSystem(conf); String s = desFs.getCanonicalServiceName(); URI uri = desFs.getUri(); {code} Canonical name string contains the default port - 8020 But uri doesn't contain port. This would result in the following exception: {code} testIsSameHdfs(org.apache.hadoop.hbase.util.TestFSHDFSUtils) Time elapsed: 0.001 sec ERROR! java.lang.IllegalArgumentException: port out of range:-1 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) at java.net.InetSocketAddress.init(InetSocketAddress.java:224) at org.apache.hadoop.hbase.util.FSHDFSUtils.getNNAddresses(FSHDFSUtils.java:88) {code} Thanks to Brando Li who helped debug this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5732) Separate memory space between BM and NN
[ https://issues.apache.org/jira/browse/HDFS-5732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Bortnikov updated HDFS-5732: --- Attachment: Remote BM.pdf Updated design of the remote BM operation. Separate memory space between BM and NN --- Key: HDFS-5732 URL: https://issues.apache.org/jira/browse/HDFS-5732 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Amir Langer Attachments: 0002-Separation-of-BM-from-NN-Step-2-Separate-memory-spac.patch, Remote BM.pdf Change created APIs to not rely on the same instance being shared in both BM and NN. Use immutable objects / keep state in sync. BM and NN will still exist in the same VM work on a new BM service as an independent process is deferred to later tasks. Also, a one to one relation between BM and NN is assumed. This task should maintain backward compatibility. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5978) Create a tool to take fsimage and expose read-only WebHDFS API
[ https://issues.apache.org/jira/browse/HDFS-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931082#comment-13931082 ] Haohui Mai commented on HDFS-5978: -- {code} echo oiv apply the offline fsimage viewer to an fsimage + echo wiv run the web fsimage viewer to an fsimage {code} It might be more better to put this functionality as a subtool of the offline image viewer, since it is intended to be a successor of the lsr tool. It might be cleaner to separate the logics of loading fsimage and handling webhdfs requests in {{FSImageHandler}}. You can wrap the information into a private static class: {code} + private String[] stringTable; + private HashMapLong, INode inodes = Maps.newHashMap(); + private HashMapLong, long[] dirmap = Maps.newHashMap(); + private ArrayListINodeReferenceSection.INodeReference refList = + Lists.newArrayList(); {code} {code} +public ChannelPipeline getPipeline() throws Exception { + ChannelPipeline pipeline = Channels.pipeline(); + pipeline.addLast(httpDecoder, new HttpRequestDecoder()); + pipeline.addLast(requestHandler, new FSImageHandler(inputFile)); + pipeline.addLast(stringEncoder, new StringEncoder()); + pipeline.addLast(httpEncoder, new HttpResponseEncoder()); + return pipeline; +} + } {code} You might be able to create a static pipeline instead of a pipeline factory. See {{setPipeline()}} for more details. I'm also unclear why {{StringEncoder}} is required. {code} + public void testWebImageViewer() throws IOException, InterruptedException { +final String port = 9001; {code} The command line needs to accept both the listening host and the port. Otherwise it'll listen to all interfaces by default. In unit test, it is also important to configure the port to zero to avoid intermediate failures. {code} + // wait until the viewer starts + Thread.sleep(3000); + {code} You can use a condition variable here instead of sleeping. {code} + HttpClient client = new DefaultHttpClient(); + HttpGet httpGet = + new HttpGet(http://localhost:; + port + /?op=LISTSTATUS); + HttpResponse response = client.execute(httpGet); + assertEquals(200, response.getStatusLine().getStatusCode()); + assertEquals(application/json, + response.getEntity().getContentType().getValue()); {code} Using the built-in {{HttpUrlConnection}} is sufficient. It reduces the dependency of the unit tests. {code} +import com.google.gson.JsonArray; +import com.google.gson.JsonObject; +import com.google.gson.JsonParser; {code} Please use jackson instead of gson. hadoop-hdfs does not depend on gson at all. That way there is no additional dependency. Create a tool to take fsimage and expose read-only WebHDFS API -- Key: HDFS-5978 URL: https://issues.apache.org/jira/browse/HDFS-5978 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Reporter: Akira AJISAKA Assignee: Akira AJISAKA Labels: newbie Attachments: HDFS-5978.patch Suggested in HDFS-5975. Add an option to exposes the read-only version of WebHDFS API for OfflineImageViewer. You can imagine it looks very similar to jhat. That way we can allow the operator to use the existing command-line tool, or even the web UI to debug the fsimage. It also allows the operator to interactively browsing the file system, figuring out what goes wrong. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6086) Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached
[ https://issues.apache.org/jira/browse/HDFS-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6086: --- Resolution: Fixed Fix Version/s: 2.4.0 Status: Resolved (was: Patch Available) committed, thanks! Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached --- Key: HDFS-6086 URL: https://issues.apache.org/jira/browse/HDFS-6086 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-6086.001.patch, HDFS-6086.002.patch We need to fix a case where zero-copy or no-checksum reads are not allowed even when the block is cached. The case is when the block is cached before the {{REQUEST_SHORT_CIRCUIT_FDS}} operation begins. In this case, {{DataXceiver}} needs to consult the {{ShortCircuitRegistry}} to see if the block is cached, rather than relying on a callback. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6092) DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port
[ https://issues.apache.org/jira/browse/HDFS-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-6092: - Attachment: hdfs-6092-v1.txt Tentative patch. DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port -- Key: HDFS-6092 URL: https://issues.apache.org/jira/browse/HDFS-6092 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Ted Yu Attachments: hdfs-6092-v1.txt I discovered this when working on HBASE-10717 Here is sample code to reproduce the problem: {code} Path desPath = new Path(hdfs://127.0.0.1/); FileSystem desFs = desPath.getFileSystem(conf); String s = desFs.getCanonicalServiceName(); URI uri = desFs.getUri(); {code} Canonical name string contains the default port - 8020 But uri doesn't contain port. This would result in the following exception: {code} testIsSameHdfs(org.apache.hadoop.hbase.util.TestFSHDFSUtils) Time elapsed: 0.001 sec ERROR! java.lang.IllegalArgumentException: port out of range:-1 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) at java.net.InetSocketAddress.init(InetSocketAddress.java:224) at org.apache.hadoop.hbase.util.FSHDFSUtils.getNNAddresses(FSHDFSUtils.java:88) {code} Thanks to Brando Li who helped debug this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6092) DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port
[ https://issues.apache.org/jira/browse/HDFS-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-6092: - Status: Patch Available (was: Open) DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port -- Key: HDFS-6092 URL: https://issues.apache.org/jira/browse/HDFS-6092 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Ted Yu Attachments: hdfs-6092-v1.txt I discovered this when working on HBASE-10717 Here is sample code to reproduce the problem: {code} Path desPath = new Path(hdfs://127.0.0.1/); FileSystem desFs = desPath.getFileSystem(conf); String s = desFs.getCanonicalServiceName(); URI uri = desFs.getUri(); {code} Canonical name string contains the default port - 8020 But uri doesn't contain port. This would result in the following exception: {code} testIsSameHdfs(org.apache.hadoop.hbase.util.TestFSHDFSUtils) Time elapsed: 0.001 sec ERROR! java.lang.IllegalArgumentException: port out of range:-1 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) at java.net.InetSocketAddress.init(InetSocketAddress.java:224) at org.apache.hadoop.hbase.util.FSHDFSUtils.getNNAddresses(FSHDFSUtils.java:88) {code} Thanks to Brando Li who helped debug this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5477) Block manager as a service
[ https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931095#comment-13931095 ] Edward Bortnikov commented on HDFS-5477: All great questions. Our previous documentation was pretty substandard. New design PDF attached at https://issues.apache.org/jira/browse/HDFS-5732 - it clarifies many things about the remote NN operation. Regarding Todd's question specifically. Yes, it's impossible to guarantee the 100% atomicity of transactions in the face of failures. This atomicity is also not necessarily required as long as no data is lost. The distributed state must eventually converge. Our solution is to treat communication failures and process failures identically. If an RPC times out, we re-establish the NN-BM connection and re-synchronize the state. (There are many ways to optimize this process, e.g., Merkle trees). Since timeouts should be very rare in a datacenter network (in the absence of bugs), this treatment is not too harsh. Block manager as a service -- Key: HDFS-5477 URL: https://issues.apache.org/jira/browse/HDFS-5477 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: Proposal.pdf, Proposal.pdf, Standalone BM.pdf, Standalone BM.pdf, patches.tar.gz The block manager needs to evolve towards having the ability to run as a standalone service to improve NN vertical and horizontal scalability. The goal is reducing the memory footprint of the NN proper to support larger namespaces, and improve overall performance by decoupling the block manager from the namespace and its lock. Ideally, a distinct BM will be transparent to clients and DNs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6086) Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached
[ https://issues.apache.org/jira/browse/HDFS-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931109#comment-13931109 ] Hudson commented on HDFS-6086: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5308 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5308/]) HDFS-6086. Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached. (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576533) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ShortCircuitReplica.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ShortCircuitRegistry.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsDatasetSpi.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/CacheManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestEnhancedByteBufferAccess.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java Fix a case where zero-copy or no-checksum reads were not allowed even when the block was cached --- Key: HDFS-6086 URL: https://issues.apache.org/jira/browse/HDFS-6086 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-6086.001.patch, HDFS-6086.002.patch We need to fix a case where zero-copy or no-checksum reads are not allowed even when the block is cached. The case is when the block is cached before the {{REQUEST_SHORT_CIRCUIT_FDS}} operation begins. In this case, {{DataXceiver}} needs to consult the {{ShortCircuitRegistry}} to see if the block is cached, rather than relying on a callback. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6092) DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port
[ https://issues.apache.org/jira/browse/HDFS-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931250#comment-13931250 ] Hadoop QA commented on HDFS-6092: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634035/hdfs-6092-v1.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDFSClientFailover org.apache.hadoop.security.TestPermissionSymlinks org.apache.hadoop.fs.TestGlobPaths org.apache.hadoop.hdfs.TestFileAppend org.apache.hadoop.hdfs.TestReplication org.apache.hadoop.fs.viewfs.TestViewFileSystemHdfs org.apache.hadoop.hdfs.web.TestWebHDFS org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup org.apache.hadoop.hdfs.server.namenode.TestNameNodeAcl org.apache.hadoop.hdfs.TestFileStatus org.apache.hadoop.hdfs.TestLease org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer org.apache.hadoop.hdfs.TestHDFSTrash org.apache.hadoop.fs.viewfs.TestViewFsDefaultValue org.apache.hadoop.fs.TestSymlinkHdfsDisable org.apache.hadoop.fs.TestHDFSFileContextMainOperations org.apache.hadoop.cli.TestAclCLI org.apache.hadoop.fs.TestUrlStreamHandler org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA org.apache.hadoop.fs.TestSymlinkHdfsFileSystem org.apache.hadoop.fs.viewfs.TestViewFileSystemAtHdfsRoot org.apache.hadoop.hdfs.TestDFSShell org.apache.hadoop.fs.shell.TestHdfsTextCommand org.apache.hadoop.fs.TestResolveHdfsSymlink org.apache.hadoop.hdfs.TestSnapshotCommands org.apache.hadoop.hdfs.server.namenode.TestFileContextAcl org.apache.hadoop.fs.viewfs.TestViewFsFileStatusHdfs org.apache.hadoop.hdfs.TestDistributedFileSystem org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestDFSClientRetries org.apache.hadoop.hdfs.server.balancer.TestBalancer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6375//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6375//console This message is automatically generated. DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port -- Key: HDFS-6092 URL: https://issues.apache.org/jira/browse/HDFS-6092 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Ted Yu Attachments: hdfs-6092-v1.txt I discovered this when working on HBASE-10717 Here is sample code to reproduce the problem: {code} Path desPath = new Path(hdfs://127.0.0.1/); FileSystem desFs = desPath.getFileSystem(conf); String s = desFs.getCanonicalServiceName(); URI uri = desFs.getUri(); {code} Canonical name string contains the default port - 8020 But uri doesn't contain port. This would result in the following exception: {code} testIsSameHdfs(org.apache.hadoop.hbase.util.TestFSHDFSUtils) Time elapsed: 0.001 sec ERROR!
[jira] [Created] (HDFS-6093) Expose more caching information for debugging by users
Andrew Wang created HDFS-6093: - Summary: Expose more caching information for debugging by users Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6093: -- Status: Patch Available (was: Open) Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931265#comment-13931265 ] Arpit Agarwal commented on HDFS-6093: - +1 for the idea, thanks Andrew! :-) I'll try to review this tomorrow if no one else has got to it by then. Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6080) Improve NFS gateway performance by making rtmax and wtmax configurable
[ https://issues.apache.org/jira/browse/HDFS-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abin Shahab updated HDFS-6080: -- Attachment: HDFS-6080.patch set rsize=1MB Improve NFS gateway performance by making rtmax and wtmax configurable -- Key: HDFS-6080 URL: https://issues.apache.org/jira/browse/HDFS-6080 Project: Hadoop HDFS Issue Type: Improvement Components: nfs, performance Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HDFS-6080.patch, HDFS-6080.patch, HDFS-6080.patch Right now rtmax and wtmax are hardcoded in RpcProgramNFS3. These dictate the maximum read and write capacity of the server. Therefore, these affect the read and write performance. We ran performance tests with 1mb, 100mb, and 1GB files. We noticed significant performance decline with the size increase when compared to fuse. We realized that the issue was with the hardcoded rtmax size(64k). When we increased the rtmax to 1MB, we got a 10x improvement in performance. NFS reads: +---++---+---+---++--+ | File | Size | Run 1 | Run 2 | Run 3 | Average| Std. Dev.| | testFile100Mb | 104857600 | 23.131158137 | 19.24552955 | 19.793332866 | 20.72334018435 | 1.7172094782219731 | | testFile1Gb | 1073741824 | 219.108776636 | 201.064032255 | 217.433909843 | 212.5355729113 | 8.14037175506561 | | testFile1Mb | 1048576| 0.330546906 | 0.256391808 | 0.28730168 | 0.291413464667 | 0.030412987573361663 | +---++---+---+---++--+ Fuse reads: +---++-+--+--++---+ | File | Size | Run 1 | Run 2| Run 3| Average| Std. Dev. | | testFile100Mb | 104857600 | 2.394459443 | 2.695265191 | 2.50046517 | 2.530063267997 | 0.12457410127142007 | | testFile1Gb | 1073741824 | 25.03324924 | 24.155102554 | 24.901525525 | 24.69662577297 | 0.386672412437576 | | testFile1Mb | 1048576| 0.271615094 | 0.270835986 | 0.271796438 | 0.271415839333 | 0.0004166483951065848 | +---++-+--+--++---+ (NFS read after rtmax = 1MB) +---++--+-+--+-+-+ | File | Size | Run 1| Run 2 | Run 3| Average | Std. Dev.| | testFile100Mb | 104857600 | 3.655261869 | 3.438676067 | 3.557464787 | 3.550467574336 | 0.0885591069882058 | | testFile1Gb | 1073741824 | 34.663612417 | 37.32089122 | 37.997718857 | 36.66074083135 | 1.4389615098060426 | | testFile1Mb | 1048576| 0.115602858 | 0.106826253 | 0.125229976 | 0.1158863623334 | 0.007515962395481867 | +---++--+-+--+-+-+ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
[ https://issues.apache.org/jira/browse/HDFS-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6089: Attachment: HDFS-6089.000.patch Simple patch to remove the editlog roll from SBN. Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended Key: HDFS-6089 URL: https://issues.apache.org/jira/browse/HDFS-6089 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HDFS-6089.000.patch The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6093: -- Attachment: hdfs-6093-1.patch Patch attached. For the DFSAdmin output, it turns out that block pool used wasn't actually being sent to the client, so I added it along with the cache stuff. I verified the dfsadmin -report output manually. I didn't modify any existing lines, just added new ones (with unique strings), so existing tools shouldn't be broken. The pending cache/uncache count is now a metric and also on the webUI. Included test makes sure that the metric ticks up and down, and I checked manually on the webUI. Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6038) JournalNode hardcodes NameNodeLayoutVersion in the edit log file
[ https://issues.apache.org/jira/browse/HDFS-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931234#comment-13931234 ] Todd Lipcon commented on HDFS-6038: --- I didn't look at the code in detail, but the design approach of length-prefixing the edits seems reasonable. My only feedback might be that it would have been nicer to do that change in a JIRA labeled as such, and then make the JN change separately. But I'm not against doing it all here -- just worried that other contributors may want to review this patch as it's actually making an edit log format change, not just a protocol change for the JNs. I also noticed a spot or two where you are missing a finally { IOUtils.closeStream(...); } -- might be worth checking for that before committing. In terms of testing, it might be nice to add a QJM test which writes fake ops to a JournalNode -- ie ops with garbage data but a correct length and checksum. Perhaps you could do this by changing QJMTestUtil.writeOp() to write a garbage op? JournalNode hardcodes NameNodeLayoutVersion in the edit log file Key: HDFS-6038 URL: https://issues.apache.org/jira/browse/HDFS-6038 Project: Hadoop HDFS Issue Type: Sub-task Components: journal-node, namenode Reporter: Haohui Mai Assignee: Jing Zhao Attachments: HDFS-6038.000.patch, HDFS-6038.001.patch, HDFS-6038.002.patch, HDFS-6038.003.patch, HDFS-6038.004.patch, HDFS-6038.005.patch, HDFS-6038.006.patch, HDFS-6038.007.patch, editsStored In HA setup, the JNs receive edit logs (blob) from the NN and write into edit log files. In order to write well-formed edit log files, the JNs prepend a header for each edit log file. The problem is that the JN hard-codes the version (i.e., {{NameNodeLayoutVersion}} in the edit log, therefore it generates incorrect edit logs when the newer release bumps the {{NameNodeLayoutVersion}} during rolling upgrade. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
[ https://issues.apache.org/jira/browse/HDFS-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-6089: Status: Patch Available (was: Open) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended Key: HDFS-6089 URL: https://issues.apache.org/jira/browse/HDFS-6089 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HDFS-6089.000.patch The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6080) Improve NFS gateway performance by making rtmax and wtmax configurable
[ https://issues.apache.org/jira/browse/HDFS-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931122#comment-13931122 ] Hadoop QA commented on HDFS-6080: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12633847/HDFS-6080.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-nfs: org.apache.hadoop.fs.TestHdfsNativeCodeLoader {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6374//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6374//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-nfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6374//console This message is automatically generated. Improve NFS gateway performance by making rtmax and wtmax configurable -- Key: HDFS-6080 URL: https://issues.apache.org/jira/browse/HDFS-6080 Project: Hadoop HDFS Issue Type: Improvement Components: nfs, performance Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HDFS-6080.patch, HDFS-6080.patch Right now rtmax and wtmax are hardcoded in RpcProgramNFS3. These dictate the maximum read and write capacity of the server. Therefore, these affect the read and write performance. We ran performance tests with 1mb, 100mb, and 1GB files. We noticed significant performance decline with the size increase when compared to fuse. We realized that the issue was with the hardcoded rtmax size(64k). When we increased the rtmax to 1MB, we got a 10x improvement in performance. NFS reads: +---++---+---+---++--+ | File | Size | Run 1 | Run 2 | Run 3 | Average| Std. Dev.| | testFile100Mb | 104857600 | 23.131158137 | 19.24552955 | 19.793332866 | 20.72334018435 | 1.7172094782219731 | | testFile1Gb | 1073741824 | 219.108776636 | 201.064032255 | 217.433909843 | 212.5355729113 | 8.14037175506561 | | testFile1Mb | 1048576| 0.330546906 | 0.256391808 | 0.28730168 | 0.291413464667 | 0.030412987573361663 | +---++---+---+---++--+ Fuse reads: +---++-+--+--++---+ | File | Size | Run 1 | Run 2| Run 3| Average| Std. Dev. | | testFile100Mb | 104857600 | 2.394459443 | 2.695265191 | 2.50046517 | 2.530063267997 | 0.12457410127142007 | | testFile1Gb | 1073741824 | 25.03324924 | 24.155102554 | 24.901525525 | 24.69662577297 | 0.386672412437576 | | testFile1Mb | 1048576| 0.271615094 | 0.270835986 | 0.271796438 | 0.271415839333 | 0.0004166483951065848 | +---++-+--+--++---+ (NFS read after rtmax = 1MB) +---++--+-+--+-+-+ | File | Size | Run 1| Run 2 | Run 3| Average | Std. Dev.| | testFile100Mb | 104857600 | 3.655261869 | 3.438676067 | 3.557464787 | 3.550467574336 | 0.0885591069882058 | | testFile1Gb | 1073741824 | 34.663612417 | 37.32089122 | 37.997718857 | 36.66074083135 | 1.4389615098060426 | |
[jira] [Commented] (HDFS-5516) WebHDFS does not require user name when anonymous http requests are disallowed.
[ https://issues.apache.org/jira/browse/HDFS-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931167#comment-13931167 ] Chris Nauroth commented on HDFS-5516: - Hi, [~miradu-msft]. The patch looks good. I think we can add unit tests in {{TestAuthFilter}} to cover the new configuration. Could you please take a look? WebHDFS does not require user name when anonymous http requests are disallowed. --- Key: HDFS-5516 URL: https://issues.apache.org/jira/browse/HDFS-5516 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 1.2.1, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5516.patch WebHDFS requests do not require user name to be specified in the request URL even when in core-site configuration options HTTP authentication is set to simple, and anonymous authentication is disabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6080) Improve NFS gateway performance by making rtmax and wtmax configurable
[ https://issues.apache.org/jira/browse/HDFS-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931141#comment-13931141 ] Brandon Li commented on HDFS-6080: -- Yes. Let's update the doc too. If the user doesn't change the mount option and the nfs client uses 32kb or 64kb instead, the 1MB setting will not take effect so it hurts nothing. Improve NFS gateway performance by making rtmax and wtmax configurable -- Key: HDFS-6080 URL: https://issues.apache.org/jira/browse/HDFS-6080 Project: Hadoop HDFS Issue Type: Improvement Components: nfs, performance Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HDFS-6080.patch, HDFS-6080.patch Right now rtmax and wtmax are hardcoded in RpcProgramNFS3. These dictate the maximum read and write capacity of the server. Therefore, these affect the read and write performance. We ran performance tests with 1mb, 100mb, and 1GB files. We noticed significant performance decline with the size increase when compared to fuse. We realized that the issue was with the hardcoded rtmax size(64k). When we increased the rtmax to 1MB, we got a 10x improvement in performance. NFS reads: +---++---+---+---++--+ | File | Size | Run 1 | Run 2 | Run 3 | Average| Std. Dev.| | testFile100Mb | 104857600 | 23.131158137 | 19.24552955 | 19.793332866 | 20.72334018435 | 1.7172094782219731 | | testFile1Gb | 1073741824 | 219.108776636 | 201.064032255 | 217.433909843 | 212.5355729113 | 8.14037175506561 | | testFile1Mb | 1048576| 0.330546906 | 0.256391808 | 0.28730168 | 0.291413464667 | 0.030412987573361663 | +---++---+---+---++--+ Fuse reads: +---++-+--+--++---+ | File | Size | Run 1 | Run 2| Run 3| Average| Std. Dev. | | testFile100Mb | 104857600 | 2.394459443 | 2.695265191 | 2.50046517 | 2.530063267997 | 0.12457410127142007 | | testFile1Gb | 1073741824 | 25.03324924 | 24.155102554 | 24.901525525 | 24.69662577297 | 0.386672412437576 | | testFile1Mb | 1048576| 0.271615094 | 0.270835986 | 0.271796438 | 0.271415839333 | 0.0004166483951065848 | +---++-+--+--++---+ (NFS read after rtmax = 1MB) +---++--+-+--+-+-+ | File | Size | Run 1| Run 2 | Run 3| Average | Std. Dev.| | testFile100Mb | 104857600 | 3.655261869 | 3.438676067 | 3.557464787 | 3.550467574336 | 0.0885591069882058 | | testFile1Gb | 1073741824 | 34.663612417 | 37.32089122 | 37.997718857 | 36.66074083135 | 1.4389615098060426 | | testFile1Mb | 1048576| 0.115602858 | 0.106826253 | 0.125229976 | 0.1158863623334 | 0.007515962395481867 | +---++--+-+--+-+-+ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6089) Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended
[ https://issues.apache.org/jira/browse/HDFS-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931275#comment-13931275 ] Hadoop QA commented on HDFS-6089: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634047/HDFS-6089.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA org.apache.hadoop.hdfs.server.namenode.ha.TestEditLogTailer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6376//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6376//console This message is automatically generated. Standby NN while transitioning to active throws a connection refused error when the prior active NN process is suspended Key: HDFS-6089 URL: https://issues.apache.org/jira/browse/HDFS-6089 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Jing Zhao Attachments: HDFS-6089.000.patch The following scenario was tested: * Determine Active NN and suspend the process (kill -19) * Wait about 60s to let the standby transition to active * Get the service state for nn1 and nn2 and make sure nn2 has transitioned to active. What was noticed that some times the call to get the service state of nn2 got a socket time out exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6080) Improve NFS gateway performance by making rtmax and wtmax configurable
[ https://issues.apache.org/jira/browse/HDFS-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931307#comment-13931307 ] Hadoop QA commented on HDFS-6080: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634050/HDFS-6080.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-nfs: org.apache.hadoop.fs.TestHdfsNativeCodeLoader org.apache.hadoop.hdfs.server.datanode.fsdataset.TestAvailableSpaceVolumeChoosingPolicy {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6377//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6377//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs-nfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6377//console This message is automatically generated. Improve NFS gateway performance by making rtmax and wtmax configurable -- Key: HDFS-6080 URL: https://issues.apache.org/jira/browse/HDFS-6080 Project: Hadoop HDFS Issue Type: Improvement Components: nfs, performance Reporter: Abin Shahab Assignee: Abin Shahab Attachments: HDFS-6080.patch, HDFS-6080.patch, HDFS-6080.patch Right now rtmax and wtmax are hardcoded in RpcProgramNFS3. These dictate the maximum read and write capacity of the server. Therefore, these affect the read and write performance. We ran performance tests with 1mb, 100mb, and 1GB files. We noticed significant performance decline with the size increase when compared to fuse. We realized that the issue was with the hardcoded rtmax size(64k). When we increased the rtmax to 1MB, we got a 10x improvement in performance. NFS reads: +---++---+---+---++--+ | File | Size | Run 1 | Run 2 | Run 3 | Average| Std. Dev.| | testFile100Mb | 104857600 | 23.131158137 | 19.24552955 | 19.793332866 | 20.72334018435 | 1.7172094782219731 | | testFile1Gb | 1073741824 | 219.108776636 | 201.064032255 | 217.433909843 | 212.5355729113 | 8.14037175506561 | | testFile1Mb | 1048576| 0.330546906 | 0.256391808 | 0.28730168 | 0.291413464667 | 0.030412987573361663 | +---++---+---+---++--+ Fuse reads: +---++-+--+--++---+ | File | Size | Run 1 | Run 2| Run 3| Average| Std. Dev. | | testFile100Mb | 104857600 | 2.394459443 | 2.695265191 | 2.50046517 | 2.530063267997 | 0.12457410127142007 | | testFile1Gb | 1073741824 | 25.03324924 | 24.155102554 | 24.901525525 | 24.69662577297 | 0.386672412437576 | | testFile1Mb | 1048576| 0.271615094 | 0.270835986 | 0.271796438 | 0.271415839333 | 0.0004166483951065848 | +---++-+--+--++---+ (NFS read after rtmax = 1MB) +---++--+-+--+-+-+ | File | Size | Run 1| Run 2 | Run 3| Average | Std. Dev.| | testFile100Mb | 104857600 | 3.655261869 | 3.438676067 | 3.557464787 | 3.550467574336 | 0.0885591069882058 | |
[jira] [Created] (HDFS-6094) The same block can be counted twice towards safe mode threshold
Arpit Agarwal created HDFS-6094: --- Summary: The same block can be counted twice towards safe mode threshold Key: HDFS-6094 URL: https://issues.apache.org/jira/browse/HDFS-6094 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal {{BlockManager#addStoredBlock}} can cause the same block can be counted towards safe mode threshold. We see this manifest via {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More details in a comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6092) DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port
[ https://issues.apache.org/jira/browse/HDFS-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HDFS-6092: - Attachment: hdfs-6092-v2.txt Patch v2 passed the following tests: {code} 636 mvn test -Dtest=TestFileStatus 637 mvn test -Dtest=TestWebHDFS,TestViewFileSystemHdfs,TestGlobPaths {code} getCanonicalServiceName() is called only when port of this.uri is -1 DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port -- Key: HDFS-6092 URL: https://issues.apache.org/jira/browse/HDFS-6092 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Ted Yu Attachments: hdfs-6092-v1.txt, hdfs-6092-v2.txt I discovered this when working on HBASE-10717 Here is sample code to reproduce the problem: {code} Path desPath = new Path(hdfs://127.0.0.1/); FileSystem desFs = desPath.getFileSystem(conf); String s = desFs.getCanonicalServiceName(); URI uri = desFs.getUri(); {code} Canonical name string contains the default port - 8020 But uri doesn't contain port. This would result in the following exception: {code} testIsSameHdfs(org.apache.hadoop.hbase.util.TestFSHDFSUtils) Time elapsed: 0.001 sec ERROR! java.lang.IllegalArgumentException: port out of range:-1 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) at java.net.InetSocketAddress.init(InetSocketAddress.java:224) at org.apache.hadoop.hbase.util.FSHDFSUtils.getNNAddresses(FSHDFSUtils.java:88) {code} Thanks to Brando Li who helped debug this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6094: Description: {{BlockManager#addStoredBlock}} can cause the same block can be counted towards safe mode threshold. We see this manifest via {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More details to follow in a comment. (was: {{BlockManager#addStoredBlock}} can cause the same block can be counted towards safe mode threshold. We see this manifest via {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More details in a comment.) The same block can be counted twice towards safe mode threshold --- Key: HDFS-6094 URL: https://issues.apache.org/jira/browse/HDFS-6094 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal {{BlockManager#addStoredBlock}} can cause the same block can be counted towards safe mode threshold. We see this manifest via {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More details to follow in a comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931338#comment-13931338 ] Thanh Do commented on HDFS-6009: Hi Yu, You mentioned although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. Can you elaborate that scenarios? I thought a datanode failure will be ok, as the data are replicated. Best, Tools based on favored node feature for isolation - Key: HDFS-6009 URL: https://issues.apache.org/jira/browse/HDFS-6009 Project: Hadoop HDFS Issue Type: Task Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in multi-tenant deployments of HBase we prefer to specify several groups of regionservers to serve different applications, to achieve some kind of isolation or resource allocation. However, although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. To relieve the above issue, we could take usage of the favored node feature (HDFS-2576) to make regionserver able to locate data within its group, or say make datanodes also grouped (passively), to form some level of isolation. In this case, or any other case that needs datanodes to group, we would need a bunch of tools to maintain the group, including: 1. Making balancer able to balance data among specified servers, rather than the whole set 2. Set balance bandwidth for specified servers, rather than the whole set 3. Some tool to check whether the block is cross-group placed, and move it back if so This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931375#comment-13931375 ] Hadoop QA commented on HDFS-6093: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634074/hdfs-6093-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6378//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6378//console This message is automatically generated. Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6092) DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port
[ https://issues.apache.org/jira/browse/HDFS-6092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13931415#comment-13931415 ] Hadoop QA commented on HDFS-6092: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634088/hdfs-6092-v2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6379//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6379//console This message is automatically generated. DistributedFileSystem#getCanonicalServiceName() and DistributedFileSystem#getUri() may return inconsistent results w.r.t. port -- Key: HDFS-6092 URL: https://issues.apache.org/jira/browse/HDFS-6092 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Ted Yu Attachments: hdfs-6092-v1.txt, hdfs-6092-v2.txt I discovered this when working on HBASE-10717 Here is sample code to reproduce the problem: {code} Path desPath = new Path(hdfs://127.0.0.1/); FileSystem desFs = desPath.getFileSystem(conf); String s = desFs.getCanonicalServiceName(); URI uri = desFs.getUri(); {code} Canonical name string contains the default port - 8020 But uri doesn't contain port. This would result in the following exception: {code} testIsSameHdfs(org.apache.hadoop.hbase.util.TestFSHDFSUtils) Time elapsed: 0.001 sec ERROR! java.lang.IllegalArgumentException: port out of range:-1 at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143) at java.net.InetSocketAddress.init(InetSocketAddress.java:224) at org.apache.hadoop.hbase.util.FSHDFSUtils.getNNAddresses(FSHDFSUtils.java:88) {code} Thanks to Brando Li who helped debug this. -- This message was sent by Atlassian JIRA (v6.2#6252)