[jira] [Commented] (HDFS-6010) Make balancer able to balance data among specified servers
[ https://issues.apache.org/jira/browse/HDFS-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934725#comment-13934725 ] Devaraj Das commented on HDFS-6010: --- [~sanjay.radia], could you please take a look at the proposal here. Make balancer able to balance data among specified servers -- Key: HDFS-6010 URL: https://issues.apache.org/jira/browse/HDFS-6010 Project: Hadoop HDFS Issue Type: Improvement Components: balancer Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor Attachments: HDFS-6010-trunk.patch Currently, the balancer tool balances data among all datanodes. However, in some particular case, we would need to balance data only among specified nodes instead of the whole set. In this JIRA, a new -servers option would be introduced to implement this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934814#comment-13934814 ] Yu Li commented on HDFS-6009: - {quote} In particular, what caused the failure in your case? Is it a disk error, network failure, or an application is buggy? {quote} In our product env, we almost encountered all the cases listed above, and experienced a hard time comforting angry users. Especially in the buggy application case, the other users affected would become crazy because of being punished by other's faults. So in our case isolation is necessary. To be more specific, our service is based on HBase, so the tools supplied here are used along with the HBase regionserver group feature(HBASE-6721). If you're interested in our use case, I've given some more detailed introduction [here|https://issues.apache.org/jira/browse/HDFS-6010?focusedCommentId=13932891page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13932891] in HDFS-6010 (just allow me to save some copy-paste effort :-)) Another thing to clarify here is that this suit of tools won't persist any datanode group information into HDFS. All the 3 tools accept a -servers option, so the admin needs to keep in mind the group information and pass it to the tools, or like in our use case, persist the group information in upper-level component like HBase. [~thanhdo], hope this answers your question and just let me know if any further comments. Tools based on favored node feature for isolation - Key: HDFS-6009 URL: https://issues.apache.org/jira/browse/HDFS-6009 Project: Hadoop HDFS Issue Type: Task Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in multi-tenant deployments of HBase we prefer to specify several groups of regionservers to serve different applications, to achieve some kind of isolation or resource allocation. However, although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. To relieve the above issue, we could take usage of the favored node feature (HDFS-2576) to make regionserver able to locate data within its group, or say make datanodes also grouped (passively), to form some level of isolation. In this case, or any other case that needs datanodes to group, we would need a bunch of tools to maintain the group, including: 1. Making balancer able to balance data among specified servers, rather than the whole set 2. Set balance bandwidth for specified servers, rather than the whole set 3. Some tool to check whether the block is cross-group placed, and move it back if so This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6080) Improve NFS gateway performance by making rtmax and wtmax configurable
[ https://issues.apache.org/jira/browse/HDFS-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934872#comment-13934872 ] Hudson commented on HDFS-6080: -- FAILURE: Integrated in Hadoop-Yarn-trunk #509 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/509/]) HDFS-6080. Improve NFS gateway performance by making rtmax and wtmax configurable. Contributed by Abin Shahab (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577319) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Constant.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm Improve NFS gateway performance by making rtmax and wtmax configurable -- Key: HDFS-6080 URL: https://issues.apache.org/jira/browse/HDFS-6080 Project: Hadoop HDFS Issue Type: Improvement Components: nfs, performance Reporter: Abin Shahab Assignee: Abin Shahab Fix For: 2.4.0 Attachments: HDFS-6080.patch, HDFS-6080.patch, HDFS-6080.patch, HDFS-6080.patch Right now rtmax and wtmax are hardcoded in RpcProgramNFS3. These dictate the maximum read and write capacity of the server. Therefore, these affect the read and write performance. We ran performance tests with 1mb, 100mb, and 1GB files. We noticed significant performance decline with the size increase when compared to fuse. We realized that the issue was with the hardcoded rtmax size(64k). When we increased the rtmax to 1MB, we got a 10x improvement in performance. NFS reads: +---++---+---+---++--+ | File | Size | Run 1 | Run 2 | Run 3 | Average| Std. Dev.| | testFile100Mb | 104857600 | 23.131158137 | 19.24552955 | 19.793332866 | 20.72334018435 | 1.7172094782219731 | | testFile1Gb | 1073741824 | 219.108776636 | 201.064032255 | 217.433909843 | 212.5355729113 | 8.14037175506561 | | testFile1Mb | 1048576| 0.330546906 | 0.256391808 | 0.28730168 | 0.291413464667 | 0.030412987573361663 | +---++---+---+---++--+ Fuse reads: +---++-+--+--++---+ | File | Size | Run 1 | Run 2| Run 3| Average| Std. Dev. | | testFile100Mb | 104857600 | 2.394459443 | 2.695265191 | 2.50046517 | 2.530063267997 | 0.12457410127142007 | | testFile1Gb | 1073741824 | 25.03324924 | 24.155102554 | 24.901525525 | 24.69662577297 | 0.386672412437576 | | testFile1Mb | 1048576| 0.271615094 | 0.270835986 | 0.271796438 | 0.271415839333 | 0.0004166483951065848 | +---++-+--+--++---+ (NFS read after rtmax = 1MB) +---++--+-+--+-+-+ | File | Size | Run 1| Run 2 | Run 3| Average | Std. Dev.| | testFile100Mb | 104857600 | 3.655261869 | 3.438676067 | 3.557464787 | 3.550467574336 | 0.0885591069882058 | | testFile1Gb | 1073741824 | 34.663612417 | 37.32089122 | 37.997718857 | 36.66074083135 | 1.4389615098060426 | | testFile1Mb | 1048576| 0.115602858 | 0.106826253 | 0.125229976 | 0.1158863623334 | 0.007515962395481867 | +---++--+-+--+-+-+ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6097) zero-copy reads are incorrectly disabled on file offsets above 2GB
[ https://issues.apache.org/jira/browse/HDFS-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934878#comment-13934878 ] Hudson commented on HDFS-6097: -- FAILURE: Integrated in Hadoop-Yarn-trunk #509 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/509/]) HDFS-6097. Zero-copy reads are incorrectly disabled on file offsets above 2GB (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577350) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ShortCircuitReplica.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestEnhancedByteBufferAccess.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java zero-copy reads are incorrectly disabled on file offsets above 2GB -- Key: HDFS-6097 URL: https://issues.apache.org/jira/browse/HDFS-6097 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-6097.003.patch, HDFS-6097.004.patch, HDFS-6097.005.patch Zero-copy reads are incorrectly disabled on file offsets above 2GB due to some code that is supposed to disable zero-copy reads on offsets in block files greater than 2GB (because MappedByteBuffer segments are limited to that size). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5244) TestNNStorageRetentionManager#testPurgeMultipleDirs fails
[ https://issues.apache.org/jira/browse/HDFS-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934875#comment-13934875 ] Hudson commented on HDFS-5244: -- FAILURE: Integrated in Hadoop-Yarn-trunk #509 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/509/]) HDFS-5244. TestNNStorageRetentionManager#testPurgeMultipleDirs fails. Contributed bye Jinghui Wang. (suresh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577254) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionManager.java TestNNStorageRetentionManager#testPurgeMultipleDirs fails - Key: HDFS-5244 URL: https://issues.apache.org/jira/browse/HDFS-5244 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.1.0-beta Environment: Red Hat Enterprise 6 with Sun Java 1.7 and IBM java 1.6 Reporter: Jinghui Wang Assignee: Jinghui Wang Fix For: 3.0.0, 2.1.0-beta, 2.4.0 Attachments: HDFS-5244.patch The test o.a.h.hdfs.server.namenode.TestNNStorageRetentionManager uses a HashMap(dirRoots) to store the root storages to be mocked for the purging test, which does not have any predictable order. The directories needs be purged are stored in a LinkedHashSet, which has a predictable order. So, when the directories get mocked for the test, they could be already out of the order that they were added. Thus, the order that the directories were actually purged and the order of them being added to the LinkedHashList could be different and cause the test to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6102) Lower the default maximum items per directory to fix PB fsimage loading
[ https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934880#comment-13934880 ] Hudson commented on HDFS-6102: -- FAILURE: Integrated in Hadoop-Yarn-trunk #509 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/509/]) HDFS-6102. Lower the default maximum items per directory to fix PB fsimage loading. Contributed by Andrew Wang. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577426) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/fsimage.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java Lower the default maximum items per directory to fix PB fsimage loading --- Key: HDFS-6102 URL: https://issues.apache.org/jira/browse/HDFS-6102 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Blocker Fix For: 2.4.0 Attachments: hdfs-6102-1.patch, hdfs-6102-2.patch Found by [~schu] during testing. We were creating a bunch of directories in a single directory to blow up the fsimage size, and it ends up we hit this error when trying to load a very large fsimage: {noformat} 2014-03-13 13:57:03,901 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 INodes. 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: Failed to load image from FSImageFile(file=/dfs/nn/current/fsimage_00024532742, cpktTxId=00024532742) com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) at com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769) at com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462) at com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188) at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.init(FsImageProto.java:9839) at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.init(FsImageProto.java:9770) at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901) at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896) at 52) ... {noformat} Some further research reveals there's a 64MB max size per PB message, which seems to be what we're hitting here. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6084) Namenode UI - Hadoop logo link shouldn't go to hadoop homepage
[ https://issues.apache.org/jira/browse/HDFS-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934877#comment-13934877 ] Hudson commented on HDFS-6084: -- FAILURE: Integrated in Hadoop-Yarn-trunk #509 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/509/]) HDFS-6084. Namenode UI - Hadoop logo link shouldn't go to hadoop homepage. Contributed by Travis Thompson. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577401) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/explorer.html Namenode UI - Hadoop logo link shouldn't go to hadoop homepage Key: HDFS-6084 URL: https://issues.apache.org/jira/browse/HDFS-6084 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor Fix For: 2.4.0 Attachments: HDFS-6084.1.patch.txt, HDFS-6084.2.patch.txt When clicking the Hadoop title the user is taken to the Hadoop homepage, which feels unintuitive. There's already a link at the bottom where it's always been, which is reasonable. I think that the title should go to the main Namenode page, #tab-overview. Suggestions? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5516) WebHDFS does not require user name when anonymous http requests are disallowed.
[ https://issues.apache.org/jira/browse/HDFS-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miodrag Radulovic updated HDFS-5516: Attachment: HDFS-5516.patch Fixed encoding of the patch file WebHDFS does not require user name when anonymous http requests are disallowed. --- Key: HDFS-5516 URL: https://issues.apache.org/jira/browse/HDFS-5516 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 1.2.1, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5516.patch, HDFS-5516.patch, HDFS-5516.patch WebHDFS requests do not require user name to be specified in the request URL even when in core-site configuration options HTTP authentication is set to simple, and anonymous authentication is disabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5516) WebHDFS does not require user name when anonymous http requests are disallowed.
[ https://issues.apache.org/jira/browse/HDFS-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934923#comment-13934923 ] Miodrag Radulovic commented on HDFS-5516: - Ok, I will submit the patch for branch-1, later today. WebHDFS does not require user name when anonymous http requests are disallowed. --- Key: HDFS-5516 URL: https://issues.apache.org/jira/browse/HDFS-5516 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 1.2.1, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5516.patch, HDFS-5516.patch, HDFS-5516.patch WebHDFS requests do not require user name to be specified in the request URL even when in core-site configuration options HTTP authentication is set to simple, and anonymous authentication is disabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6087) Unify HDFS write/append/truncate
[ https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guo Ruijing updated HDFS-6087: -- Attachment: HDFS Design Proposal_3_14.pdf Unify HDFS write/append/truncate Key: HDFS-6087 URL: https://issues.apache.org/jira/browse/HDFS-6087 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Guo Ruijing Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf In existing implementation, HDFS file can be appended and HDFS block can be reopened for append. This design will introduce complexity including lease recovery. If we design HDFS block as immutable, it will be very simple for append truncate. The idea is that HDFS block is immutable if the block is committed to namenode. If the block is not committed to namenode, it is HDFS client’s responsibility to re-added with new block ID. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6104) TestFsLimits#testDefaultMaxComponentLength Fails on branch-2
Mit Desai created HDFS-6104: --- Summary: TestFsLimits#testDefaultMaxComponentLength Fails on branch-2 Key: HDFS-6104 URL: https://issues.apache.org/jira/browse/HDFS-6104 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Mit Desai Assignee: Mit Desai testDefaultMaxComponentLength fails intermittently with the following error {noformat} java.lang.AssertionError: expected:0 but was:255 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.namenode.TestFsLimits.testDefaultMaxComponentLength(TestFsLimits.java:90) {noformat} On doing some research, I found that this is actually a JDK7 issue. The test always fails when it runs after any test that runs addChildWithName() method -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate
[ https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935017#comment-13935017 ] Guo Ruijing commented on HDFS-6087: --- update new document according to Konstantin's comments Unify HDFS write/append/truncate Key: HDFS-6087 URL: https://issues.apache.org/jira/browse/HDFS-6087 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Guo Ruijing Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf In existing implementation, HDFS file can be appended and HDFS block can be reopened for append. This design will introduce complexity including lease recovery. If we design HDFS block as immutable, it will be very simple for append truncate. The idea is that HDFS block is immutable if the block is committed to namenode. If the block is not committed to namenode, it is HDFS client’s responsibility to re-added with new block ID. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5244) TestNNStorageRetentionManager#testPurgeMultipleDirs fails
[ https://issues.apache.org/jira/browse/HDFS-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935029#comment-13935029 ] Hudson commented on HDFS-5244: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1701 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1701/]) HDFS-5244. TestNNStorageRetentionManager#testPurgeMultipleDirs fails. Contributed bye Jinghui Wang. (suresh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577254) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionManager.java TestNNStorageRetentionManager#testPurgeMultipleDirs fails - Key: HDFS-5244 URL: https://issues.apache.org/jira/browse/HDFS-5244 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.1.0-beta Environment: Red Hat Enterprise 6 with Sun Java 1.7 and IBM java 1.6 Reporter: Jinghui Wang Assignee: Jinghui Wang Fix For: 3.0.0, 2.1.0-beta, 2.4.0 Attachments: HDFS-5244.patch The test o.a.h.hdfs.server.namenode.TestNNStorageRetentionManager uses a HashMap(dirRoots) to store the root storages to be mocked for the purging test, which does not have any predictable order. The directories needs be purged are stored in a LinkedHashSet, which has a predictable order. So, when the directories get mocked for the test, they could be already out of the order that they were added. Thus, the order that the directories were actually purged and the order of them being added to the LinkedHashList could be different and cause the test to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6080) Improve NFS gateway performance by making rtmax and wtmax configurable
[ https://issues.apache.org/jira/browse/HDFS-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935026#comment-13935026 ] Hudson commented on HDFS-6080: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1701 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1701/]) HDFS-6080. Improve NFS gateway performance by making rtmax and wtmax configurable. Contributed by Abin Shahab (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577319) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Constant.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm Improve NFS gateway performance by making rtmax and wtmax configurable -- Key: HDFS-6080 URL: https://issues.apache.org/jira/browse/HDFS-6080 Project: Hadoop HDFS Issue Type: Improvement Components: nfs, performance Reporter: Abin Shahab Assignee: Abin Shahab Fix For: 2.4.0 Attachments: HDFS-6080.patch, HDFS-6080.patch, HDFS-6080.patch, HDFS-6080.patch Right now rtmax and wtmax are hardcoded in RpcProgramNFS3. These dictate the maximum read and write capacity of the server. Therefore, these affect the read and write performance. We ran performance tests with 1mb, 100mb, and 1GB files. We noticed significant performance decline with the size increase when compared to fuse. We realized that the issue was with the hardcoded rtmax size(64k). When we increased the rtmax to 1MB, we got a 10x improvement in performance. NFS reads: +---++---+---+---++--+ | File | Size | Run 1 | Run 2 | Run 3 | Average| Std. Dev.| | testFile100Mb | 104857600 | 23.131158137 | 19.24552955 | 19.793332866 | 20.72334018435 | 1.7172094782219731 | | testFile1Gb | 1073741824 | 219.108776636 | 201.064032255 | 217.433909843 | 212.5355729113 | 8.14037175506561 | | testFile1Mb | 1048576| 0.330546906 | 0.256391808 | 0.28730168 | 0.291413464667 | 0.030412987573361663 | +---++---+---+---++--+ Fuse reads: +---++-+--+--++---+ | File | Size | Run 1 | Run 2| Run 3| Average| Std. Dev. | | testFile100Mb | 104857600 | 2.394459443 | 2.695265191 | 2.50046517 | 2.530063267997 | 0.12457410127142007 | | testFile1Gb | 1073741824 | 25.03324924 | 24.155102554 | 24.901525525 | 24.69662577297 | 0.386672412437576 | | testFile1Mb | 1048576| 0.271615094 | 0.270835986 | 0.271796438 | 0.271415839333 | 0.0004166483951065848 | +---++-+--+--++---+ (NFS read after rtmax = 1MB) +---++--+-+--+-+-+ | File | Size | Run 1| Run 2 | Run 3| Average | Std. Dev.| | testFile100Mb | 104857600 | 3.655261869 | 3.438676067 | 3.557464787 | 3.550467574336 | 0.0885591069882058 | | testFile1Gb | 1073741824 | 34.663612417 | 37.32089122 | 37.997718857 | 36.66074083135 | 1.4389615098060426 | | testFile1Mb | 1048576| 0.115602858 | 0.106826253 | 0.125229976 | 0.1158863623334 | 0.007515962395481867 | +---++--+-+--+-+-+ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6084) Namenode UI - Hadoop logo link shouldn't go to hadoop homepage
[ https://issues.apache.org/jira/browse/HDFS-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935031#comment-13935031 ] Hudson commented on HDFS-6084: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1701 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1701/]) HDFS-6084. Namenode UI - Hadoop logo link shouldn't go to hadoop homepage. Contributed by Travis Thompson. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577401) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/explorer.html Namenode UI - Hadoop logo link shouldn't go to hadoop homepage Key: HDFS-6084 URL: https://issues.apache.org/jira/browse/HDFS-6084 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor Fix For: 2.4.0 Attachments: HDFS-6084.1.patch.txt, HDFS-6084.2.patch.txt When clicking the Hadoop title the user is taken to the Hadoop homepage, which feels unintuitive. There's already a link at the bottom where it's always been, which is reasonable. I think that the title should go to the main Namenode page, #tab-overview. Suggestions? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6097) zero-copy reads are incorrectly disabled on file offsets above 2GB
[ https://issues.apache.org/jira/browse/HDFS-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935032#comment-13935032 ] Hudson commented on HDFS-6097: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1701 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1701/]) HDFS-6097. Zero-copy reads are incorrectly disabled on file offsets above 2GB (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577350) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ShortCircuitReplica.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestEnhancedByteBufferAccess.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java zero-copy reads are incorrectly disabled on file offsets above 2GB -- Key: HDFS-6097 URL: https://issues.apache.org/jira/browse/HDFS-6097 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-6097.003.patch, HDFS-6097.004.patch, HDFS-6097.005.patch Zero-copy reads are incorrectly disabled on file offsets above 2GB due to some code that is supposed to disable zero-copy reads on offsets in block files greater than 2GB (because MappedByteBuffer segments are limited to that size). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6102) Lower the default maximum items per directory to fix PB fsimage loading
[ https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935034#comment-13935034 ] Hudson commented on HDFS-6102: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1701 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1701/]) HDFS-6102. Lower the default maximum items per directory to fix PB fsimage loading. Contributed by Andrew Wang. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577426) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/fsimage.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java Lower the default maximum items per directory to fix PB fsimage loading --- Key: HDFS-6102 URL: https://issues.apache.org/jira/browse/HDFS-6102 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Blocker Fix For: 2.4.0 Attachments: hdfs-6102-1.patch, hdfs-6102-2.patch Found by [~schu] during testing. We were creating a bunch of directories in a single directory to blow up the fsimage size, and it ends up we hit this error when trying to load a very large fsimage: {noformat} 2014-03-13 13:57:03,901 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 INodes. 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: Failed to load image from FSImageFile(file=/dfs/nn/current/fsimage_00024532742, cpktTxId=00024532742) com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) at com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769) at com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462) at com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188) at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.init(FsImageProto.java:9839) at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.init(FsImageProto.java:9770) at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901) at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896) at 52) ... {noformat} Some further research reveals there's a 64MB max size per PB message, which seems to be what we're hitting here. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5516) WebHDFS does not require user name when anonymous http requests are disallowed.
[ https://issues.apache.org/jira/browse/HDFS-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miodrag Radulovic updated HDFS-5516: Attachment: HDFS-5516-branch-1.patch Fix for the branch-1. WebHDFS does not require user name when anonymous http requests are disallowed. --- Key: HDFS-5516 URL: https://issues.apache.org/jira/browse/HDFS-5516 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 1.2.1, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5516-branch-1.patch, HDFS-5516.patch, HDFS-5516.patch, HDFS-5516.patch WebHDFS requests do not require user name to be specified in the request URL even when in core-site configuration options HTTP authentication is set to simple, and anonymous authentication is disabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5516) WebHDFS does not require user name when anonymous http requests are disallowed.
[ https://issues.apache.org/jira/browse/HDFS-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935069#comment-13935069 ] Hadoop QA commented on HDFS-5516: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634699/HDFS-5516.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6403//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6403//console This message is automatically generated. WebHDFS does not require user name when anonymous http requests are disallowed. --- Key: HDFS-5516 URL: https://issues.apache.org/jira/browse/HDFS-5516 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 1.2.1, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5516-branch-1.patch, HDFS-5516.patch, HDFS-5516.patch, HDFS-5516.patch WebHDFS requests do not require user name to be specified in the request URL even when in core-site configuration options HTTP authentication is set to simple, and anonymous authentication is disabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5516) WebHDFS does not require user name when anonymous http requests are disallowed.
[ https://issues.apache.org/jira/browse/HDFS-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935074#comment-13935074 ] Hadoop QA commented on HDFS-5516: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634720/HDFS-5516-branch-1.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6404//console This message is automatically generated. WebHDFS does not require user name when anonymous http requests are disallowed. --- Key: HDFS-5516 URL: https://issues.apache.org/jira/browse/HDFS-5516 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 3.0.0, 1.2.1, 2.2.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Attachments: HDFS-5516-branch-1.patch, HDFS-5516.patch, HDFS-5516.patch, HDFS-5516.patch WebHDFS requests do not require user name to be specified in the request URL even when in core-site configuration options HTTP authentication is set to simple, and anonymous authentication is disabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6094: Status: Patch Available (was: Open) The same block can be counted twice towards safe mode threshold --- Key: HDFS-6094 URL: https://issues.apache.org/jira/browse/HDFS-6094 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6904.01.patch, TestHASafeMode-output.txt {{BlockManager#addStoredBlock}} can cause the same block can be counted towards safe mode threshold. We see this manifest via {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More details to follow in a comment. Exception details: {code} Time elapsed: 12.874 sec FAILURE! java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically in 28 seconds.' at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5244) TestNNStorageRetentionManager#testPurgeMultipleDirs fails
[ https://issues.apache.org/jira/browse/HDFS-5244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935108#comment-13935108 ] Hudson commented on HDFS-5244: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1726 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1726/]) HDFS-5244. TestNNStorageRetentionManager#testPurgeMultipleDirs fails. Contributed bye Jinghui Wang. (suresh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577254) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionManager.java TestNNStorageRetentionManager#testPurgeMultipleDirs fails - Key: HDFS-5244 URL: https://issues.apache.org/jira/browse/HDFS-5244 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.1.0-beta Environment: Red Hat Enterprise 6 with Sun Java 1.7 and IBM java 1.6 Reporter: Jinghui Wang Assignee: Jinghui Wang Fix For: 3.0.0, 2.1.0-beta, 2.4.0 Attachments: HDFS-5244.patch The test o.a.h.hdfs.server.namenode.TestNNStorageRetentionManager uses a HashMap(dirRoots) to store the root storages to be mocked for the purging test, which does not have any predictable order. The directories needs be purged are stored in a LinkedHashSet, which has a predictable order. So, when the directories get mocked for the test, they could be already out of the order that they were added. Thus, the order that the directories were actually purged and the order of them being added to the LinkedHashList could be different and cause the test to fail. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6102) Lower the default maximum items per directory to fix PB fsimage loading
[ https://issues.apache.org/jira/browse/HDFS-6102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935113#comment-13935113 ] Hudson commented on HDFS-6102: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1726 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1726/]) HDFS-6102. Lower the default maximum items per directory to fix PB fsimage loading. Contributed by Andrew Wang. (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577426) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/fsimage.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/hdfs.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFsLimits.java Lower the default maximum items per directory to fix PB fsimage loading --- Key: HDFS-6102 URL: https://issues.apache.org/jira/browse/HDFS-6102 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Blocker Fix For: 2.4.0 Attachments: hdfs-6102-1.patch, hdfs-6102-2.patch Found by [~schu] during testing. We were creating a bunch of directories in a single directory to blow up the fsimage size, and it ends up we hit this error when trying to load a very large fsimage: {noformat} 2014-03-13 13:57:03,901 INFO org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 24523605 INodes. 2014-03-13 13:57:59,038 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: Failed to load image from FSImageFile(file=/dfs/nn/current/fsimage_00024532742, cpktTxId=00024532742) com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large. May be malicious. Use CodedInputStream.setSizeLimit() to increase the size limit. at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110) at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755) at com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769) at com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462) at com.google.protobuf.CodedInputStream.readUInt64(CodedInputStream.java:188) at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.init(FsImageProto.java:9839) at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry.init(FsImageProto.java:9770) at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9901) at org.apache.hadoop.hdfs.server.namenode.FsImageProto$INodeDirectorySection$DirEntry$1.parsePartialFrom(FsImageProto.java:9896) at 52) ... {noformat} Some further research reveals there's a 64MB max size per PB message, which seems to be what we're hitting here. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6080) Improve NFS gateway performance by making rtmax and wtmax configurable
[ https://issues.apache.org/jira/browse/HDFS-6080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935105#comment-13935105 ] Hudson commented on HDFS-6080: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1726 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1726/]) HDFS-6080. Improve NFS gateway performance by making rtmax and wtmax configurable. Contributed by Abin Shahab (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577319) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Constant.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/RpcProgramNfs3.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsNfsGateway.apt.vm Improve NFS gateway performance by making rtmax and wtmax configurable -- Key: HDFS-6080 URL: https://issues.apache.org/jira/browse/HDFS-6080 Project: Hadoop HDFS Issue Type: Improvement Components: nfs, performance Reporter: Abin Shahab Assignee: Abin Shahab Fix For: 2.4.0 Attachments: HDFS-6080.patch, HDFS-6080.patch, HDFS-6080.patch, HDFS-6080.patch Right now rtmax and wtmax are hardcoded in RpcProgramNFS3. These dictate the maximum read and write capacity of the server. Therefore, these affect the read and write performance. We ran performance tests with 1mb, 100mb, and 1GB files. We noticed significant performance decline with the size increase when compared to fuse. We realized that the issue was with the hardcoded rtmax size(64k). When we increased the rtmax to 1MB, we got a 10x improvement in performance. NFS reads: +---++---+---+---++--+ | File | Size | Run 1 | Run 2 | Run 3 | Average| Std. Dev.| | testFile100Mb | 104857600 | 23.131158137 | 19.24552955 | 19.793332866 | 20.72334018435 | 1.7172094782219731 | | testFile1Gb | 1073741824 | 219.108776636 | 201.064032255 | 217.433909843 | 212.5355729113 | 8.14037175506561 | | testFile1Mb | 1048576| 0.330546906 | 0.256391808 | 0.28730168 | 0.291413464667 | 0.030412987573361663 | +---++---+---+---++--+ Fuse reads: +---++-+--+--++---+ | File | Size | Run 1 | Run 2| Run 3| Average| Std. Dev. | | testFile100Mb | 104857600 | 2.394459443 | 2.695265191 | 2.50046517 | 2.530063267997 | 0.12457410127142007 | | testFile1Gb | 1073741824 | 25.03324924 | 24.155102554 | 24.901525525 | 24.69662577297 | 0.386672412437576 | | testFile1Mb | 1048576| 0.271615094 | 0.270835986 | 0.271796438 | 0.271415839333 | 0.0004166483951065848 | +---++-+--+--++---+ (NFS read after rtmax = 1MB) +---++--+-+--+-+-+ | File | Size | Run 1| Run 2 | Run 3| Average | Std. Dev.| | testFile100Mb | 104857600 | 3.655261869 | 3.438676067 | 3.557464787 | 3.550467574336 | 0.0885591069882058 | | testFile1Gb | 1073741824 | 34.663612417 | 37.32089122 | 37.997718857 | 36.66074083135 | 1.4389615098060426 | | testFile1Mb | 1048576| 0.115602858 | 0.106826253 | 0.125229976 | 0.1158863623334 | 0.007515962395481867 | +---++--+-+--+-+-+ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6097) zero-copy reads are incorrectly disabled on file offsets above 2GB
[ https://issues.apache.org/jira/browse/HDFS-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935111#comment-13935111 ] Hudson commented on HDFS-6097: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1726 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1726/]) HDFS-6097. Zero-copy reads are incorrectly disabled on file offsets above 2GB (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577350) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ShortCircuitReplica.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestEnhancedByteBufferAccess.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java zero-copy reads are incorrectly disabled on file offsets above 2GB -- Key: HDFS-6097 URL: https://issues.apache.org/jira/browse/HDFS-6097 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-6097.003.patch, HDFS-6097.004.patch, HDFS-6097.005.patch Zero-copy reads are incorrectly disabled on file offsets above 2GB due to some code that is supposed to disable zero-copy reads on offsets in block files greater than 2GB (because MappedByteBuffer segments are limited to that size). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6084) Namenode UI - Hadoop logo link shouldn't go to hadoop homepage
[ https://issues.apache.org/jira/browse/HDFS-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935110#comment-13935110 ] Hudson commented on HDFS-6084: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1726 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1726/]) HDFS-6084. Namenode UI - Hadoop logo link shouldn't go to hadoop homepage. Contributed by Travis Thompson. (wheat9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1577401) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/dfshealth.html * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/hdfs/explorer.html Namenode UI - Hadoop logo link shouldn't go to hadoop homepage Key: HDFS-6084 URL: https://issues.apache.org/jira/browse/HDFS-6084 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor Fix For: 2.4.0 Attachments: HDFS-6084.1.patch.txt, HDFS-6084.2.patch.txt When clicking the Hadoop title the user is taken to the Hadoop homepage, which feels unintuitive. There's already a link at the bottom where it's always been, which is reasonable. I think that the title should go to the main Namenode page, #tab-overview. Suggestions? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6105) NN web UI for DN list loads the same jmx page three times.
Kihwal Lee created HDFS-6105: Summary: NN web UI for DN list loads the same jmx page three times. Key: HDFS-6105 URL: https://issues.apache.org/jira/browse/HDFS-6105 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Kihwal Lee When loading Datanodes page of the NN web UI, the same jmx query is made three times. For a big cluster, that's a lot of data and overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6099) HDFS file system limits not enforced on renames.
[ https://issues.apache.org/jira/browse/HDFS-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935144#comment-13935144 ] Chris Nauroth commented on HDFS-6099: - The failure in {{TestBalancer}} is unrelated. HDFS file system limits not enforced on renames. Key: HDFS-6099 URL: https://issues.apache.org/jira/browse/HDFS-6099 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Fix For: 2.4.0 Attachments: HDFS-6099.1.patch, HDFS-6099.2.patch {{dfs.namenode.fs-limits.max-component-length}} and {{dfs.namenode.fs-limits.max-directory-items}} are not enforced on the destination path during rename operations. This means that it's still possible to create files that violate these limits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6105) NN web UI for DN list loads the same jmx page multiple times.
[ https://issues.apache.org/jira/browse/HDFS-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6105: - Summary: NN web UI for DN list loads the same jmx page multiple times. (was: NN web UI for DN list loads the same jmx page three times.) NN web UI for DN list loads the same jmx page multiple times. - Key: HDFS-6105 URL: https://issues.apache.org/jira/browse/HDFS-6105 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Kihwal Lee When loading Datanodes page of the NN web UI, the same jmx query is made three times. For a big cluster, that's a lot of data and overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6105) NN web UI for DN list loads the same jmx page multiple times.
[ https://issues.apache.org/jira/browse/HDFS-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935154#comment-13935154 ] Kihwal Lee commented on HDFS-6105: -- It was pointed out in HDFS-5748 before. If you try reloading the page, it won't load multiple time. But if you click on the Datanodes tab, you will see multiple redundant GET being issued. If you alternate between Overview and Datanodes, it gets worse. After going back and forth several times, I see it being loaded 9 times when clicking Datanodes. NN web UI for DN list loads the same jmx page multiple times. - Key: HDFS-6105 URL: https://issues.apache.org/jira/browse/HDFS-6105 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Kihwal Lee When loading Datanodes page of the NN web UI, the same jmx query is made multiple times. For a big cluster, that's a lot of data and overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6105) NN web UI for DN list loads the same jmx page multiple times.
[ https://issues.apache.org/jira/browse/HDFS-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935155#comment-13935155 ] Kihwal Lee commented on HDFS-6105: -- [~wheat9]: Would take a look at what's going on? NN web UI for DN list loads the same jmx page multiple times. - Key: HDFS-6105 URL: https://issues.apache.org/jira/browse/HDFS-6105 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Kihwal Lee When loading Datanodes page of the NN web UI, the same jmx query is made multiple times. For a big cluster, that's a lot of data and overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6105) NN web UI for DN list loads the same jmx page multiple times.
[ https://issues.apache.org/jira/browse/HDFS-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-6105: - Description: When loading Datanodes page of the NN web UI, the same jmx query is made multiple times. For a big cluster, that's a lot of data and overhead. (was: When loading Datanodes page of the NN web UI, the same jmx query is made three times. For a big cluster, that's a lot of data and overhead.) NN web UI for DN list loads the same jmx page multiple times. - Key: HDFS-6105 URL: https://issues.apache.org/jira/browse/HDFS-6105 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Kihwal Lee When loading Datanodes page of the NN web UI, the same jmx query is made multiple times. For a big cluster, that's a lot of data and overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6084) Namenode UI - Hadoop logo link shouldn't go to hadoop homepage
[ https://issues.apache.org/jira/browse/HDFS-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935169#comment-13935169 ] Travis Thompson commented on HDFS-6084: --- Thanks everyone Namenode UI - Hadoop logo link shouldn't go to hadoop homepage Key: HDFS-6084 URL: https://issues.apache.org/jira/browse/HDFS-6084 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor Fix For: 2.4.0 Attachments: HDFS-6084.1.patch.txt, HDFS-6084.2.patch.txt When clicking the Hadoop title the user is taken to the Hadoop homepage, which feels unintuitive. There's already a link at the bottom where it's always been, which is reasonable. I think that the title should go to the main Namenode page, #tab-overview. Suggestions? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6009) Tools based on favored node feature for isolation
[ https://issues.apache.org/jira/browse/HDFS-6009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935194#comment-13935194 ] Thanh Do commented on HDFS-6009: Yu Li, thanks for your detailed comment! Your use case is a great example of isolation. We are currently working on some similar problems but at a lower level on the software stack, thus your use case is a great motivation. Tools based on favored node feature for isolation - Key: HDFS-6009 URL: https://issues.apache.org/jira/browse/HDFS-6009 Project: Hadoop HDFS Issue Type: Task Affects Versions: 2.3.0 Reporter: Yu Li Assignee: Yu Li Priority: Minor There're scenarios like mentioned in HBASE-6721 and HBASE-4210 that in multi-tenant deployments of HBase we prefer to specify several groups of regionservers to serve different applications, to achieve some kind of isolation or resource allocation. However, although the regionservers are grouped, the datanodes which store the data are not, which leads to the case that one datanode failure affects multiple applications, as we already observed in our product environment. To relieve the above issue, we could take usage of the favored node feature (HDFS-2576) to make regionserver able to locate data within its group, or say make datanodes also grouped (passively), to form some level of isolation. In this case, or any other case that needs datanodes to group, we would need a bunch of tools to maintain the group, including: 1. Making balancer able to balance data among specified servers, rather than the whole set 2. Set balance bandwidth for specified servers, rather than the whole set 3. Some tool to check whether the block is cross-group placed, and move it back if so This JIRA is an umbrella for the above tools. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6104) TestFsLimits#testDefaultMaxComponentLength Fails on branch-2
[ https://issues.apache.org/jira/browse/HDFS-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA resolved HDFS-6104. - Resolution: Invalid Assignee: (was: Mit Desai) Closing this issue because the test was removed by HDFS-6102. TestFsLimits#testDefaultMaxComponentLength Fails on branch-2 Key: HDFS-6104 URL: https://issues.apache.org/jira/browse/HDFS-6104 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Mit Desai Labels: java7 testDefaultMaxComponentLength fails intermittently with the following error {noformat} java.lang.AssertionError: expected:0 but was:255 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.hadoop.hdfs.server.namenode.TestFsLimits.testDefaultMaxComponentLength(TestFsLimits.java:90) {noformat} On doing some research, I found that this is actually a JDK7 issue. The test always fails when it runs after any test that runs addChildWithName() method -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6105) NN web UI for DN list loads the same jmx page multiple times.
[ https://issues.apache.org/jira/browse/HDFS-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935293#comment-13935293 ] Haohui Mai commented on HDFS-6105: -- Every time you click on the datenode tab, it'll reload {{/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo}} to get the up-to-date information of the datanode. This is expected. NN web UI for DN list loads the same jmx page multiple times. - Key: HDFS-6105 URL: https://issues.apache.org/jira/browse/HDFS-6105 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Kihwal Lee When loading Datanodes page of the NN web UI, the same jmx query is made multiple times. For a big cluster, that's a lot of data and overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935296#comment-13935296 ] Jing Zhao commented on HDFS-6094: - The patch looks good to me. One question is, currently NN adds info about a new datanode storage only when processing complete block report. Can we also do this for IBR? The same block can be counted twice towards safe mode threshold --- Key: HDFS-6094 URL: https://issues.apache.org/jira/browse/HDFS-6094 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6904.01.patch, TestHASafeMode-output.txt {{BlockManager#addStoredBlock}} can cause the same block can be counted towards safe mode threshold. We see this manifest via {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More details to follow in a comment. Exception details: {code} Time elapsed: 12.874 sec FAILURE! java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically in 28 seconds.' at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935303#comment-13935303 ] Hadoop QA commented on HDFS-6094: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634642/HDFS-6904.01.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6405//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6405//console This message is automatically generated. The same block can be counted twice towards safe mode threshold --- Key: HDFS-6094 URL: https://issues.apache.org/jira/browse/HDFS-6094 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6904.01.patch, TestHASafeMode-output.txt {{BlockManager#addStoredBlock}} can cause the same block can be counted towards safe mode threshold. We see this manifest via {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More details to follow in a comment. Exception details: {code} Time elapsed: 12.874 sec FAILURE! java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically in 28 seconds.' at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6007) Update documentation about short-circuit local reads
[ https://issues.apache.org/jira/browse/HDFS-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935312#comment-13935312 ] Colin Patrick McCabe commented on HDFS-6007: Looks good, I think we're getting close. {code} + Legacy short-circuit local reads implementation + on which the clients directly open the HDFS block files is still available + for the platforms other than Linux. {code} Missing the {code} + Because Legacy short-circuit local reads is insecure, + access to this feature is limited to the users listed in + the value of dfs.block.local-path-access.user. {code} I think this section needs to be moved after the section about dfs.datanode.data.dir.perm. Otherwise it's not clear why the legacy SCR is insecure. Update documentation about short-circuit local reads Key: HDFS-6007 URL: https://issues.apache.org/jira/browse/HDFS-6007 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Priority: Minor Attachments: HDFS-6007-0.patch, HDFS-6007-1.patch, HDFS-6007-2.patch, HDFS-6007-3.patch, HDFS-6007-4.patch updating the contents of HDFS SHort-Circuit Local Reads based on the changes in HDFS-4538 and HDFS-4953. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5840) Follow-up to HDFS-5138 to improve error handling during partial upgrade failures
[ https://issues.apache.org/jira/browse/HDFS-5840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935327#comment-13935327 ] Aaron T. Myers commented on HDFS-5840: -- Sorry, got swamped this week. Will try to get to it early next. Follow-up to HDFS-5138 to improve error handling during partial upgrade failures Key: HDFS-5840 URL: https://issues.apache.org/jira/browse/HDFS-5840 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Aaron T. Myers Assignee: Aaron T. Myers Fix For: 3.0.0 Attachments: HDFS-5840.patch Suresh posted some good comment in HDFS-5138 after that patch had already been committed to trunk. This JIRA is to address those. See the first comment of this JIRA for the full content of the review. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5997) TestHASafeMode#testBlocksAddedWhileStandbyIsDown fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-5997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935342#comment-13935342 ] Arpit Agarwal commented on HDFS-5997: - Thanks for reporting this [~yuzhih...@gmail.com]. I missed it when filing HDFS-6094. Jing or I will post an updated patch for it soon, if either of you have a consistent repro it would be great if you can also help verify. TestHASafeMode#testBlocksAddedWhileStandbyIsDown fails in trunk --- Key: HDFS-5997 URL: https://issues.apache.org/jira/browse/HDFS-5997 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1681/ : REGRESSION: org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown Error Message: {code} Bad safemode status: 'Safe mode is ON. The reported blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically in 28 seconds.' {code} Stack Trace: {code} java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically in 28 seconds.' at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6100) DataNodeWebHdfsMethods does not failover in HA mode
[ https://issues.apache.org/jira/browse/HDFS-6100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935411#comment-13935411 ] Jing Zhao commented on HDFS-6100: - The patch looks pretty good to me. Some minor comments: # In DatanodeWebHdfsMethods, the current patch has some inconsistent field name for the NamenodeAddressParam parameter (nnId, namenodeId, and namenodeRpcAddress). How about just calling them namenode since it can be either NameService ID or NameNode RPC address? # Nit: the following code needs some reformat: {code} tokenServiceName = HAUtil.isHAEnabled(conf, nsId) ? nsId : NetUtils.getHostPortString (rpcServer.getRpcAddress()); {code} # In the new unit test, we can add some extra check about the content of the new created file. Also, maybe we can try to transition the second NN to active first so that the first create call can also hit a failover. # Looks like the patch also fixes the token service name in HA setup for webhdfs. Please update the description of the jira. # Could you also post your system test results (HA, non-HA, secured, insecure setup etc.)? DataNodeWebHdfsMethods does not failover in HA mode --- Key: HDFS-6100 URL: https://issues.apache.org/jira/browse/HDFS-6100 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.4.0 Reporter: Arpit Gupta Assignee: Haohui Mai Attachments: HDFS-6100.000.patch In {{DataNodeWebHdfsMethods}}, the code creates a {{DFSClient}} to connect to the NN, so that it can access the files in the cluster. {{DataNodeWebHdfsMethods}} relies on the address passed in the URL to locate the NN. Currently the parameter is set by the NN and it is a host-ip pair, which does not support HA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6007) Update documentation about short-circuit local reads
[ https://issues.apache.org/jira/browse/HDFS-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-6007: --- Attachment: HDFS-6007-5.patch attaching the updated patch. Update documentation about short-circuit local reads Key: HDFS-6007 URL: https://issues.apache.org/jira/browse/HDFS-6007 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Priority: Minor Attachments: HDFS-6007-0.patch, HDFS-6007-1.patch, HDFS-6007-2.patch, HDFS-6007-3.patch, HDFS-6007-4.patch, HDFS-6007-5.patch updating the contents of HDFS SHort-Circuit Local Reads based on the changes in HDFS-4538 and HDFS-4953. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6105) NN web UI for DN list loads the same jmx page multiple times.
[ https://issues.apache.org/jira/browse/HDFS-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935440#comment-13935440 ] Kihwal Lee commented on HDFS-6105: -- This is expected. Please read what I said earlier. One click causes multiple loads of the same page. NN web UI for DN list loads the same jmx page multiple times. - Key: HDFS-6105 URL: https://issues.apache.org/jira/browse/HDFS-6105 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Kihwal Lee When loading Datanodes page of the NN web UI, the same jmx query is made multiple times. For a big cluster, that's a lot of data and overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6090) Use MiniDFSCluster.Builder instead of deprecated constructors
[ https://issues.apache.org/jira/browse/HDFS-6090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA reassigned HDFS-6090: --- Assignee: Akira AJISAKA Use MiniDFSCluster.Builder instead of deprecated constructors - Key: HDFS-6090 URL: https://issues.apache.org/jira/browse/HDFS-6090 Project: Hadoop HDFS Issue Type: Improvement Components: test Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Some test classes are using deprecated constructors such as {{MiniDFSCluster(Configuration, int, boolean, String[], String[])}} for building a MiniDFSCluster. These classes should use {{MiniDFSCluster.Builder}} to reduce javac warnings and improve code readability. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935500#comment-13935500 ] Arpit Agarwal commented on HDFS-6094: - Jing, I think it is a good idea to learn about storages from the IBR. One issue with doing so is that the storage type and state are not known while processing the IBR. We can assume some defaults but this can lead to bugs since the type and state can be used to make replication decisions. I think we need to enhance the incremental report protocol to send the storage type and state along with the storage ID. Then we can safely create a new storage entry. For protocol compatibility we can assume defaults if the type and state are not provided. I am going to code up the patch. Thanks for the ideas! The same block can be counted twice towards safe mode threshold --- Key: HDFS-6094 URL: https://issues.apache.org/jira/browse/HDFS-6094 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6904.01.patch, TestHASafeMode-output.txt {{BlockManager#addStoredBlock}} can cause the same block can be counted towards safe mode threshold. We see this manifest via {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More details to follow in a comment. Exception details: {code} Time elapsed: 12.874 sec FAILURE! java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically in 28 seconds.' at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6105) NN web UI for DN list loads the same jmx page multiple times.
[ https://issues.apache.org/jira/browse/HDFS-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935563#comment-13935563 ] Haohui Mai commented on HDFS-6105: -- I can't reproduce the bug. What browser you're using? NN web UI for DN list loads the same jmx page multiple times. - Key: HDFS-6105 URL: https://issues.apache.org/jira/browse/HDFS-6105 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Kihwal Lee When loading Datanodes page of the NN web UI, the same jmx query is made multiple times. For a big cluster, that's a lot of data and overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6105) NN web UI for DN list loads the same jmx page multiple times.
[ https://issues.apache.org/jira/browse/HDFS-6105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Travis Thompson updated HDFS-6105: -- Attachment: datanodes-tab.png I can reproduce it in 2.3.0. If I open the NN page and (with the Firefox console open) I click on the Datanodes tab, 3 GETs are sent to http://nn/jmx very quickly. I've attached an image of the Firefox console. Using Firefox 27.0.1 on Mac OS 1.8.5 NN web UI for DN list loads the same jmx page multiple times. - Key: HDFS-6105 URL: https://issues.apache.org/jira/browse/HDFS-6105 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Kihwal Lee Attachments: datanodes-tab.png When loading Datanodes page of the NN web UI, the same jmx query is made multiple times. For a big cluster, that's a lot of data and overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6088) Add configurable maximum block count for datanode
[ https://issues.apache.org/jira/browse/HDFS-6088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935638#comment-13935638 ] Kihwal Lee commented on HDFS-6088: -- bq. Would be nice to avoid having yet another config that users have to set. I agree. I was looking at the heap usage of a DN. It looks like the heap usage has dropped considerably since we moved to use GSet for block map. So much so that the automatically defined GSet capacity doesn't seem to be sufficient. For example, I brought up a DN with about 62K blocks with the max heap set to 1GB. The GSet was created for 524,288 entries. Looking at the heap usage, each block takes up about 315 bytes. Other parts take up less than 50MB. In any case, 315 * 524288 = 157MB. Even if other parts take up more than expected, the node can easily store 4X of this. But storing 2M entries in the small GSet is not ideal. Add configurable maximum block count for datanode - Key: HDFS-6088 URL: https://issues.apache.org/jira/browse/HDFS-6088 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Currently datanode resources are protected by the free space check and the balancer. But datanodes can run out of memory simply storing too many blocks. If the sizes of blocks are small, datanodes will appear to have plenty of space to put more blocks. I propose adding a configurable max block count to datanode. Since datanodes can have different heap configurations, it will make sense to make it datanode-level, rather than something enforced by namenode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate
[ https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935686#comment-13935686 ] Konstantin Shvachko commented on HDFS-6087: --- Based on what you write, I see two main problems with your approach. # A block cannot be read by others while under construction, until it is fully written and committed. That would be a step back. Making UC-blocks readable was one of the append design requirements (see HDFS-265 and preceding work). If a slow client writes to a block 1KB/min others will have to wait for hours until they can see the progress on the file. # Your proposal (if I understand it correctly) will potentially lead to a lot of small blocks if appends, fscyncs (and truncates) are used intensively. Say, in order to overcome problem (1) I write my application so that it closes the file after each 1KB written and reopens for append one minute later. You get lots of 1KB blocks. And small blocks are bad for the NameNode as we know. Unify HDFS write/append/truncate Key: HDFS-6087 URL: https://issues.apache.org/jira/browse/HDFS-6087 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Guo Ruijing Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf In existing implementation, HDFS file can be appended and HDFS block can be reopened for append. This design will introduce complexity including lease recovery. If we design HDFS block as immutable, it will be very simple for append truncate. The idea is that HDFS block is immutable if the block is committed to namenode. If the block is not committed to namenode, it is HDFS client’s responsibility to re-added with new block ID. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6007) Update documentation about short-circuit local reads
[ https://issues.apache.org/jira/browse/HDFS-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935668#comment-13935668 ] Hadoop QA commented on HDFS-6007: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634789/HDFS-6007-5.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation patch that doesn't require tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6406//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6406//console This message is automatically generated. Update documentation about short-circuit local reads Key: HDFS-6007 URL: https://issues.apache.org/jira/browse/HDFS-6007 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Reporter: Masatake Iwasaki Priority: Minor Attachments: HDFS-6007-0.patch, HDFS-6007-1.patch, HDFS-6007-2.patch, HDFS-6007-3.patch, HDFS-6007-4.patch, HDFS-6007-5.patch updating the contents of HDFS SHort-Circuit Local Reads based on the changes in HDFS-4538 and HDFS-4953. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-6094: Attachment: HDFS-6094.03.patch Update patch with Jing's suggestion. To do this right required some additions to the {{DatanodeProtocol}} and some corresponding changes within the DataNode. Protocol changes are wire compatible. Jenkins will flag some new warnings for using deprecated APIs which is expected. The usages in the protobuf translators are required for wire compatibility and the remaining usages are in a couple of tests and in {{NNThroughputBenchmark}} which we can update later. The same block can be counted twice towards safe mode threshold --- Key: HDFS-6094 URL: https://issues.apache.org/jira/browse/HDFS-6094 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6094.03.patch, HDFS-6904.01.patch, TestHASafeMode-output.txt {{BlockManager#addStoredBlock}} can cause the same block can be counted towards safe mode threshold. We see this manifest via {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More details to follow in a comment. Exception details: {code} Time elapsed: 12.874 sec FAILURE! java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically in 28 seconds.' at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate
[ https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935707#comment-13935707 ] Tsz Wo Nicholas Sze commented on HDFS-6087: --- 1. A block cannot be read by others while under construction, until it is fully written and committed. ... It also does not support hflush. 2. Your proposal (if I understand it correctly) will potentially lead to a lot of small blocks if appends, fscyncs (and truncates) are used intensively. ... I guess it won't lead to a lot of small block since it does copy-on-write. However, there is going to be a lot of block coping if there are a lot of append, hsync, etc. In addition, I think it would be a problem for reading the last block: If a reader opens a file and reads the last block slowly, then a writer reopen the file for append and committed the new last block. The old last block may then be deleted and becomes not available to the read anymore. Unify HDFS write/append/truncate Key: HDFS-6087 URL: https://issues.apache.org/jira/browse/HDFS-6087 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Guo Ruijing Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf In existing implementation, HDFS file can be appended and HDFS block can be reopened for append. This design will introduce complexity including lease recovery. If we design HDFS block as immutable, it will be very simple for append truncate. The idea is that HDFS block is immutable if the block is committed to namenode. If the block is not committed to namenode, it is HDFS client’s responsibility to re-added with new block ID. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935723#comment-13935723 ] Colin Patrick McCabe commented on HDFS-6093: This looks good overall. I think rather than protect {{CacheReplicationMonitor#numPendingCaching}} with the FSN lock, it would be better to make it an Atomic64 that we swap in at the end of the rescan. That way we're not baking in the assumption that the rescan thread holds the FSN lock for the whole duration of the rescan. It also would minimize the time we spend blocking waiting for the FSN lock in the MBean stuff. Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6087) Unify HDFS write/append/truncate
[ https://issues.apache.org/jira/browse/HDFS-6087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935727#comment-13935727 ] Konstantin Shvachko commented on HDFS-6087: --- If it does copy-on-write, then the block is not immutable, at least in the sense I understand the term. Unify HDFS write/append/truncate Key: HDFS-6087 URL: https://issues.apache.org/jira/browse/HDFS-6087 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Guo Ruijing Attachments: HDFS Design Proposal.pdf, HDFS Design Proposal_3_14.pdf In existing implementation, HDFS file can be appended and HDFS block can be reopened for append. This design will introduce complexity including lease recovery. If we design HDFS block as immutable, it will be very simple for append truncate. The idea is that HDFS block is immutable if the block is committed to namenode. If the block is not committed to namenode, it is HDFS client’s responsibility to re-added with new block ID. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6103) FSImage file system image version check throw a (slightly) wrong parameter.
[ https://issues.apache.org/jira/browse/HDFS-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935747#comment-13935747 ] jun aoki commented on HDFS-6103: Hi [~vinayrpet], I got the error message when I executed {code} sudo service hadoop-hdfs-namenode start {code} Then I found that I'd have to execute {code} sudo service hadoop-hdfs-namenode upgrade #(1) {code} Note that this does not have a hyphen e.g. -upgrade I also have found that users can execute hadoop-daemon.sh. I've never tried it this way but something like {code} hadoop-daemon.sh --config /etc/hadoop start namenode -upgrade # (2) {code} Then this will require a hyphen. I thought (1) is a preferred way thus this ticket, but if I'm wrong and (2) is equally or more preferred, please let me know. FSImage file system image version check throw a (slightly) wrong parameter. --- Key: HDFS-6103 URL: https://issues.apache.org/jira/browse/HDFS-6103 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: jun aoki Priority: Trivial Trivial error message issue: When upgrading hdfs, say from 2.0.5 to 2.2.0, users will need to start namenode with upgrade option. e.g. {code} sudo service namenode upgrade {code} That said, the actual error while without the option said -upgrade (with a hyphen) {code} 2014-03-13 23:38:15,488 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.io.IOException: File system image contains an old layout version -40. An upgrade to version -47 is required. Please restart NameNode with -upgrade option. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:221) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:787) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:568) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:443) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:684) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:669) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320) 2014-03-13 23:38:15,492 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2014-03-13 23:38:15,493 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.2.202 / ~ {code} I'm referring to 2.0.5 above, https://github.com/apache/hadoop-common/blob/branch-2.0.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L225 I haven't tried the trunk but it seems to return UPGRADE (all upper case) which again anther slightly wrong error description. https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L232 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935758#comment-13935758 ] Arpit Agarwal edited comment on HDFS-6093 at 3/14/14 10:45 PM: --- Hi Andrew, I just tried this out your patch and I think there is some mismatch between the output of {{dfsAdmin -report}} and {{cacheadmin -listPools}}. This is with a single NN/single DN pseudocluster on Centos 6.5. I ran the following commands: - bin/hdfs cacheadmin -addPool pool1 -limit 1073741824 - bin/hdfs cacheadmin -addDirective -path /f1 -pool pool1 This says FILES_CACHED is zero. {code} $ bin/hdfs cacheadmin -listPools -stats Found 1 result. NAME OWNER GROUP MODE LIMIT MAXTTL BYTES_NEEDED BYTES_CACHED BYTES_OVERLIMIT FILES_NEEDED FILES_CACHED pool1 aagarwal aagarwal rwxr-xr-x 1073741824 never 1048576 00 1 0 {code} However this says cache used is 1MB. {code} $ bin/hdfs dfsadmin -report Configured Capacity: 49202208768 (45.82 GB) Present Capacity: 39676268544 (36.95 GB) DFS Remaining: 39675179008 (36.95 GB) DFS Used: 1089536 (1.04 MB) DFS Used%: 0.00% Configured Cache Capacity: 268435456 (256 MB) Present Cache Capacity: 268435456 (256 MB) Cache Remaining: 267386880 (255 MB) Cache Used: 1048576 (1 MB) Cache Used%: 0.39% {code} I did not see any error messages related to caching in the DN/NN logs. was (Author: arpitagarwal): Hi Andrew, I just tried this out your patch and I think there is some mismatch between the output of {{dfsAdmin -report}} and {{cacheadmin -listPools}}. This is with a single NN/single DN pseudocluster on Centos 6.5. I ran the following commands: - bin/hdfs cacheadmin -addPool pool1 -limit 1073741824 - bin/hdfs cacheadmin -addDirective -path /f1 -pool pool1 This says FILES_CACHED is zero. {code} $ bin/hdfs cacheadmin -listPools -stats Found 1 result. NAME OWNER GROUP MODE LIMIT MAXTTL BYTES_NEEDED BYTES_CACHED BYTES_OVERLIMIT FILES_NEEDED FILES_CACHED pool1 aagarwal aagarwal rwxr-xr-x 1073741824 never 1048576 00 1 0 {code} However this says cache used is 1MB. {code} aagarwal@arrow ~/deploy2/hadoop-3.0.0-SNAPSHOT$ bin/hdfs dfsadmin -report Configured Capacity: 49202208768 (45.82 GB) Present Capacity: 39676268544 (36.95 GB) DFS Remaining: 39675179008 (36.95 GB) DFS Used: 1089536 (1.04 MB) DFS Used%: 0.00% Configured Cache Capacity: 268435456 (256 MB) Present Cache Capacity: 268435456 (256 MB) Cache Remaining: 267386880 (255 MB) Cache Used: 1048576 (1 MB) Cache Used%: 0.39% {code} I did not see any error messages related to caching in the DN/NN logs. Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935758#comment-13935758 ] Arpit Agarwal commented on HDFS-6093: - Hi Andrew, I just tried this out your patch and I think there is some mismatch between the output of {{dfsAdmin -report}} and {{cacheadmin -listPools}}. This is with a single NN/single DN pseudocluster on Centos 6.5. I ran the following commands: - bin/hdfs cacheadmin -addPool pool1 -limit 1073741824 - bin/hdfs cacheadmin -addDirective -path /f1 -pool pool1 This says FILES_CACHED is zero. {code} $ bin/hdfs cacheadmin -listPools -stats Found 1 result. NAME OWNER GROUP MODE LIMIT MAXTTL BYTES_NEEDED BYTES_CACHED BYTES_OVERLIMIT FILES_NEEDED FILES_CACHED pool1 aagarwal aagarwal rwxr-xr-x 1073741824 never 1048576 00 1 0 {code} However this says cache used is 1MB. {code} aagarwal@arrow ~/deploy2/hadoop-3.0.0-SNAPSHOT$ bin/hdfs dfsadmin -report Configured Capacity: 49202208768 (45.82 GB) Present Capacity: 39676268544 (36.95 GB) DFS Remaining: 39675179008 (36.95 GB) DFS Used: 1089536 (1.04 MB) DFS Used%: 0.00% Configured Cache Capacity: 268435456 (256 MB) Present Cache Capacity: 268435456 (256 MB) Cache Remaining: 267386880 (255 MB) Cache Used: 1048576 (1 MB) Cache Used%: 0.39% {code} I did not see any error messages related to caching in the DN/NN logs. Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935771#comment-13935771 ] Colin Patrick McCabe commented on HDFS-6093: Hi Aprit, It takes time for the values reported by dfsAdmin -report and cacheadmin -listPools to converge, since dfsAdmin comes from information taken from the DN heartbeat, and listPools comes from information taken from the CacheReplicationMonitor. Try waiting 5 or 10 minutes. We might want to shorten the default for {{dfs.namenode.path.based.cache.retry.interval.ms}} for this reason. Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935770#comment-13935770 ] Andrew Wang commented on HDFS-6093: --- Hey Arpit, So the confusing thing about these stats is how the pool and directive stats are only updated when the CacheReplicationMonitor runs (default every 5 mins). The datanode-level stats are updated on the heartbeat, so much more frequent. I think if you wait for a CRM run, it'll then show up in listPools. I was considering lowering the default CRM interval for this reason, maybe to 1 min or 30s, for this reason. Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935777#comment-13935777 ] Colin Patrick McCabe commented on HDFS-6093: bq. I was considering lowering the default CRM interval for this reason, maybe to 1 min or 30s, for this reason. Yeah, maybe we should set it to 30 seconds for now to get a better user experience. We can always raise it if a performance issues emerges on a big cluster. Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6103) FSImage file system image version check throw a (slightly) wrong parameter.
[ https://issues.apache.org/jira/browse/HDFS-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935784#comment-13935784 ] Akira AJISAKA commented on HDFS-6103: - Hi [~jaoki], what distribution of Hadoop are you using? AFAIK, service scripts are not provided in Apache Hadoop itself, so (2) is preferred if you are using community version. FSImage file system image version check throw a (slightly) wrong parameter. --- Key: HDFS-6103 URL: https://issues.apache.org/jira/browse/HDFS-6103 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: jun aoki Priority: Trivial Trivial error message issue: When upgrading hdfs, say from 2.0.5 to 2.2.0, users will need to start namenode with upgrade option. e.g. {code} sudo service namenode upgrade {code} That said, the actual error while without the option said -upgrade (with a hyphen) {code} 2014-03-13 23:38:15,488 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.io.IOException: File system image contains an old layout version -40. An upgrade to version -47 is required. Please restart NameNode with -upgrade option. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:221) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:787) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:568) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:443) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:684) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:669) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320) 2014-03-13 23:38:15,492 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2014-03-13 23:38:15,493 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.2.202 / ~ {code} I'm referring to 2.0.5 above, https://github.com/apache/hadoop-common/blob/branch-2.0.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L225 I haven't tried the trunk but it seems to return UPGRADE (all upper case) which again anther slightly wrong error description. https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L232 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935791#comment-13935791 ] Colin Patrick McCabe commented on HDFS-6093: sorry, meant to write dfs.namenode.path.based.cache.refresh.interval.ms Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935797#comment-13935797 ] Colin Patrick McCabe commented on HDFS-6093: I filed HDFS-6106 to reduce the defaults a bit. Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6106) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
[ https://issues.apache.org/jira/browse/HDFS-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6106: --- Attachment: HDFS-6106.001.patch Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms Key: HDFS-6106 URL: https://issues.apache.org/jira/browse/HDFS-6106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6106.001.patch Reduce the default for {{dfs.namenode.path.based.cache.refresh.interval.ms}} to improve the responsiveness of caching. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6106) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
[ https://issues.apache.org/jira/browse/HDFS-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe reassigned HDFS-6106: -- Assignee: Colin Patrick McCabe Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms Key: HDFS-6106 URL: https://issues.apache.org/jira/browse/HDFS-6106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6106.001.patch Reduce the default for {{dfs.namenode.path.based.cache.refresh.interval.ms}} to improve the responsiveness of caching. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6106) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
[ https://issues.apache.org/jira/browse/HDFS-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-6106: --- Status: Patch Available (was: Open) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms Key: HDFS-6106 URL: https://issues.apache.org/jira/browse/HDFS-6106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6106.001.patch Reduce the default for {{dfs.namenode.path.based.cache.refresh.interval.ms}} to improve the responsiveness of caching. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6106) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
Colin Patrick McCabe created HDFS-6106: -- Summary: Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms Key: HDFS-6106 URL: https://issues.apache.org/jira/browse/HDFS-6106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Attachments: HDFS-6106.001.patch Reduce the default for {{dfs.namenode.path.based.cache.refresh.interval.ms}} to improve the responsiveness of caching. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6106) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
[ https://issues.apache.org/jira/browse/HDFS-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935816#comment-13935816 ] Andrew Wang commented on HDFS-6106: --- +1 pending Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms Key: HDFS-6106 URL: https://issues.apache.org/jira/browse/HDFS-6106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6106.001.patch Reduce the default for {{dfs.namenode.path.based.cache.refresh.interval.ms}} to improve the responsiveness of caching. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935859#comment-13935859 ] Hadoop QA commented on HDFS-6094: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634845/HDFS-6094.03.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1540 javac compiler warnings (more than the trunk's current 1531 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6407//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6407//artifact/trunk/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6407//console This message is automatically generated. The same block can be counted twice towards safe mode threshold --- Key: HDFS-6094 URL: https://issues.apache.org/jira/browse/HDFS-6094 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6094.03.patch, HDFS-6904.01.patch, TestHASafeMode-output.txt {{BlockManager#addStoredBlock}} can cause the same block can be counted towards safe mode threshold. We see this manifest via {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More details to follow in a comment. Exception details: {code} Time elapsed: 12.874 sec FAILURE! java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically in 28 seconds.' at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6093) Expose more caching information for debugging by users
[ https://issues.apache.org/jira/browse/HDFS-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935858#comment-13935858 ] Arpit Agarwal commented on HDFS-6093: - Thanks Andrew/Colin, the values did converge! wrt the patch: # In addition to reducing the timeout as you suggested, can we add some explanation to the command output, or update CentralizedCacheManagement.html in the docs? Additionally does it make sense to display the pending caching/uncaching counts in the output of 'dfsadmin -report'? This would make it clear right away that there are some pending cache operations. # {{CacheReplicationMonitor#rescan}} resets the counters to zero outside the write lock. It should be moved inside the lock else readers might see blips with the counters intermittently going to zero. # Was {{stillPendingUncached}} introduced to fix a bug? Minor code style comment: {{getPendingCachingCount}} can be condensed to {code} return (monitor != null ? monitor.getPendingCachingCount() : 0); {code} Same with {{getPendingUncachingCount}}. Change looks good otherwise. Expose more caching information for debugging by users -- Key: HDFS-6093 URL: https://issues.apache.org/jira/browse/HDFS-6093 Project: Hadoop HDFS Issue Type: Improvement Components: caching Affects Versions: 2.4.0 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-6093-1.patch When users submit a new cache directive, it's unclear if the NN has recognized it and is actively trying to cache it, or if it's hung for some other reason. It'd be nice to expose a pending caching/uncaching count the same way we expose pending replication work. It'd also be nice to display the aggregate cache capacity and usage in dfsadmin -report, since we already have have it as a metric and expose it per-DN in report output. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6094) The same block can be counted twice towards safe mode threshold
[ https://issues.apache.org/jira/browse/HDFS-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935862#comment-13935862 ] Arpit Agarwal commented on HDFS-6094: - The warnings are expected due to new deprecations. We can fix the test warnings later. The same block can be counted twice towards safe mode threshold --- Key: HDFS-6094 URL: https://issues.apache.org/jira/browse/HDFS-6094 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.0 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-6094.03.patch, HDFS-6904.01.patch, TestHASafeMode-output.txt {{BlockManager#addStoredBlock}} can cause the same block can be counted towards safe mode threshold. We see this manifest via {{TestHASafeMode#testBlocksAddedWhileStandbyIsDown}} failures on Ubuntu. More details to follow in a comment. Exception details: {code} Time elapsed: 12.874 sec FAILURE! java.lang.AssertionError: Bad safemode status: 'Safe mode is ON. The reported blocks 7 has reached the threshold 0.9990 of total blocks 6. The number of live datanodes 3 has reached the minimum number 0. Safe mode will be turned off automatically in 28 seconds.' at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.assertSafeMode(TestHASafeMode.java:493) at org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode.testBlocksAddedWhileStandbyIsDown(TestHASafeMode.java:660) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6103) FSImage file system image version check throw a (slightly) wrong parameter.
[ https://issues.apache.org/jira/browse/HDFS-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935892#comment-13935892 ] jun aoki commented on HDFS-6103: Hi [~ajisakaa], thank you for clarifying. I'm using bigtop. Let's focus on StartupOption.UPGRADE in this ticket. FSImage file system image version check throw a (slightly) wrong parameter. --- Key: HDFS-6103 URL: https://issues.apache.org/jira/browse/HDFS-6103 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.2.0 Reporter: jun aoki Priority: Trivial Trivial error message issue: When upgrading hdfs, say from 2.0.5 to 2.2.0, users will need to start namenode with upgrade option. e.g. {code} sudo service namenode upgrade {code} That said, the actual error while without the option said -upgrade (with a hyphen) {code} 2014-03-13 23:38:15,488 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.io.IOException: File system image contains an old layout version -40. An upgrade to version -47 is required. Please restart NameNode with -upgrade option. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:221) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:787) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:568) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:443) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:491) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:684) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:669) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320) 2014-03-13 23:38:15,492 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2014-03-13 23:38:15,493 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at nn1/192.168.2.202 / ~ {code} I'm referring to 2.0.5 above, https://github.com/apache/hadoop-common/blob/branch-2.0.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L225 I haven't tried the trunk but it seems to return UPGRADE (all upper case) which again anther slightly wrong error description. https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java#L232 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6106) Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms
[ https://issues.apache.org/jira/browse/HDFS-6106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935931#comment-13935931 ] Hadoop QA commented on HDFS-6106: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12634861/HDFS-6106.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6408//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6408//console This message is automatically generated. Reduce default for dfs.namenode.path.based.cache.refresh.interval.ms Key: HDFS-6106 URL: https://issues.apache.org/jira/browse/HDFS-6106 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-6106.001.patch Reduce the default for {{dfs.namenode.path.based.cache.refresh.interval.ms}} to improve the responsiveness of caching. -- This message was sent by Atlassian JIRA (v6.2#6252)