[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201762#comment-14201762 ] Hadoop QA commented on HDFS-7279: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12679588/HDFS-7279.007.patch against trunk revision 61effcb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFsck org.apache.hadoop.hdfs.server.namenode.TestDeleteRace {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8687//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8687//console This message is automatically generated. Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN
[ https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201763#comment-14201763 ] Hadoop QA commented on HDFS-7314: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680087/HDFS-7314-4.patch against trunk revision ba0a42c. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFsck org.apache.hadoop.hdfs.server.namenode.TestDeleteRace {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8686//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8686//console This message is automatically generated. Aborted DFSClient's impact on long running service like YARN Key: HDFS-7314 URL: https://issues.apache.org/jira/browse/HDFS-7314 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-7314-2.patch, HDFS-7314-3.patch, HDFS-7314-4.patch, HDFS-7314.patch It happened in YARN nodemanger scenario. But it could happen to any long running service that use cached instance of DistrbutedFileSystem. 1. Active NN is under heavy load. So it became unavailable for 10 minutes; any DFSClient request will get ConnectTimeoutException. 2. YARN nodemanager use DFSClient for certain write operation such as log aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's renewLease RPC got ConnectTimeoutException. {noformat} 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds. Aborting ... {noformat} 3. After DFSClient is in Aborted state, YARN NM can't use that cached instance of DistributedFileSystem. {noformat} 2014-10-29 20:26:23,991 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc... java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} We can make YARN or DFSClient more tolerant to temporary NN unavailability. Given the callstack is YARN - DistributedFileSystem - DFSClient, this can be addressed at different layers. * YARN closes the DistributedFileSystem object when it receives some well defined exception. Then the next HDFS call will create a new instance of DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS applications need to address this as well. * DistributedFileSystem detects Aborted DFSClient and create a new instance of DFSClient. We will need to fix all the places
[jira] [Commented] (HDFS-7310) Mover can give first priority to local DN if it has target storage type available in local DN
[ https://issues.apache.org/jira/browse/HDFS-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201801#comment-14201801 ] Hadoop QA commented on HDFS-7310: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680093/HDFS-7310-003.patch against trunk revision 61effcb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8688//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8688//console This message is automatically generated. Mover can give first priority to local DN if it has target storage type available in local DN - Key: HDFS-7310 URL: https://issues.apache.org/jira/browse/HDFS-7310 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Affects Versions: 3.0.0 Reporter: Uma Maheswara Rao G Assignee: Vinayakumar B Attachments: HDFS-7310-001.patch, HDFS-7310-002.patch, HDFS-7310-003.patch Currently Mover logic may move blocks to any DN which had target storage type. But if the src DN has target storage type then mover can give highest priority to local DN. If local DN does not contains target storage type, then it can assign to any DN as the current logic does. This is a thought, have not go through the code fully yet. Thoughts? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7376) Upgrade jsch lib to jsch-0.1.51 to avoid problems running on java7
Johannes Zillmann created HDFS-7376: --- Summary: Upgrade jsch lib to jsch-0.1.51 to avoid problems running on java7 Key: HDFS-7376 URL: https://issues.apache.org/jira/browse/HDFS-7376 Project: Hadoop HDFS Issue Type: Bug Reporter: Johannes Zillmann We had an application sitting on top of Hadoop and got problems using jsch once we switched to java 7. Got this exception: {noformat} com.jcraft.jsch.JSchException: verify: false at com.jcraft.jsch.Session.connect(Session.java:330) at com.jcraft.jsch.Session.connect(Session.java:183) {noformat} Upgrading to jsch-0.1.51 from jsch-0.1.49 fixed the issue for us, but then it got in conflict with hadoop's jsch version (we fixed this for us by jarjar'ing our jsch version). So i think jsch got introduce by namenode HA (HDFS-1623). So you guys should check if the ssh part is properly working for java7 or preventively upgrade the jsch lib to jsch-0.1.51! Some references to problems reported: - http://sourceforge.net/p/jsch/mailman/jsch-users/thread/loom.20131009t211650-...@post.gmane.org/ -https://issues.apache.org/bugzilla/show_bug.cgi?id=53437 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7376) Upgrade jsch lib to jsch-0.1.51 to avoid problems running on java7
[ https://issues.apache.org/jira/browse/HDFS-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johannes Zillmann updated HDFS-7376: Description: We had an application sitting on top of Hadoop and got problems using jsch once we switched to java 7. Got this exception: {noformat} com.jcraft.jsch.JSchException: verify: false at com.jcraft.jsch.Session.connect(Session.java:330) at com.jcraft.jsch.Session.connect(Session.java:183) {noformat} Upgrading to jsch-0.1.51 from jsch-0.1.49 fixed the issue for us, but then it got in conflict with hadoop's jsch version (we fixed this for us by jarjar'ing our jsch version). So i think jsch got introduce by namenode HA (HDFS-1623). So you guys should check if the ssh part is properly working for java7 or preventively upgrade the jsch lib to jsch-0.1.51! Some references to problems reported: - http://sourceforge.net/p/jsch/mailman/jsch-users/thread/loom.20131009t211650-...@post.gmane.org/ - https://issues.apache.org/bugzilla/show_bug.cgi?id=53437 was: We had an application sitting on top of Hadoop and got problems using jsch once we switched to java 7. Got this exception: {noformat} com.jcraft.jsch.JSchException: verify: false at com.jcraft.jsch.Session.connect(Session.java:330) at com.jcraft.jsch.Session.connect(Session.java:183) {noformat} Upgrading to jsch-0.1.51 from jsch-0.1.49 fixed the issue for us, but then it got in conflict with hadoop's jsch version (we fixed this for us by jarjar'ing our jsch version). So i think jsch got introduce by namenode HA (HDFS-1623). So you guys should check if the ssh part is properly working for java7 or preventively upgrade the jsch lib to jsch-0.1.51! Some references to problems reported: - http://sourceforge.net/p/jsch/mailman/jsch-users/thread/loom.20131009t211650-...@post.gmane.org/ -https://issues.apache.org/bugzilla/show_bug.cgi?id=53437 Upgrade jsch lib to jsch-0.1.51 to avoid problems running on java7 -- Key: HDFS-7376 URL: https://issues.apache.org/jira/browse/HDFS-7376 Project: Hadoop HDFS Issue Type: Bug Reporter: Johannes Zillmann We had an application sitting on top of Hadoop and got problems using jsch once we switched to java 7. Got this exception: {noformat} com.jcraft.jsch.JSchException: verify: false at com.jcraft.jsch.Session.connect(Session.java:330) at com.jcraft.jsch.Session.connect(Session.java:183) {noformat} Upgrading to jsch-0.1.51 from jsch-0.1.49 fixed the issue for us, but then it got in conflict with hadoop's jsch version (we fixed this for us by jarjar'ing our jsch version). So i think jsch got introduce by namenode HA (HDFS-1623). So you guys should check if the ssh part is properly working for java7 or preventively upgrade the jsch lib to jsch-0.1.51! Some references to problems reported: - http://sourceforge.net/p/jsch/mailman/jsch-users/thread/loom.20131009t211650-...@post.gmane.org/ - https://issues.apache.org/bugzilla/show_bug.cgi?id=53437 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7377) [ DataNode Web UI ] Metric page will not display anything..
Brahma Reddy Battula created HDFS-7377: -- Summary: [ DataNode Web UI ] Metric page will not display anything.. Key: HDFS-7377 URL: https://issues.apache.org/jira/browse/HDFS-7377 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Priority: Critical Scenario : == Go to http://DN_IP:http port/dataNodeHome.jsp and click on metrics link.. we will not able to see any thing.. Did not find reason..do we need to implent metric page..? Checked the HDFS-2933,but did not find any clue... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7376) Upgrade jsch lib to jsch-0.1.51 to avoid problems running on java7
[ https://issues.apache.org/jira/browse/HDFS-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HDFS-7376: - Component/s: build Target Version/s: 2.7.0 Upgrade jsch lib to jsch-0.1.51 to avoid problems running on java7 -- Key: HDFS-7376 URL: https://issues.apache.org/jira/browse/HDFS-7376 Project: Hadoop HDFS Issue Type: Bug Components: build Reporter: Johannes Zillmann We had an application sitting on top of Hadoop and got problems using jsch once we switched to java 7. Got this exception: {noformat} com.jcraft.jsch.JSchException: verify: false at com.jcraft.jsch.Session.connect(Session.java:330) at com.jcraft.jsch.Session.connect(Session.java:183) {noformat} Upgrading to jsch-0.1.51 from jsch-0.1.49 fixed the issue for us, but then it got in conflict with hadoop's jsch version (we fixed this for us by jarjar'ing our jsch version). So i think jsch got introduce by namenode HA (HDFS-1623). So you guys should check if the ssh part is properly working for java7 or preventively upgrade the jsch lib to jsch-0.1.51! Some references to problems reported: - http://sourceforge.net/p/jsch/mailman/jsch-users/thread/loom.20131009t211650-...@post.gmane.org/ - https://issues.apache.org/bugzilla/show_bug.cgi?id=53437 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7376) Upgrade jsch lib to jsch-0.1.51 to avoid problems running on java7
[ https://issues.apache.org/jira/browse/HDFS-7376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201820#comment-14201820 ] Steve Loughran commented on HDFS-7376: -- not seen any reports of this -yet- but it still seems worth doing. It's too late to do it for 2.6; targeting 2.7, which is java7+ only Upgrade jsch lib to jsch-0.1.51 to avoid problems running on java7 -- Key: HDFS-7376 URL: https://issues.apache.org/jira/browse/HDFS-7376 Project: Hadoop HDFS Issue Type: Bug Components: build Reporter: Johannes Zillmann We had an application sitting on top of Hadoop and got problems using jsch once we switched to java 7. Got this exception: {noformat} com.jcraft.jsch.JSchException: verify: false at com.jcraft.jsch.Session.connect(Session.java:330) at com.jcraft.jsch.Session.connect(Session.java:183) {noformat} Upgrading to jsch-0.1.51 from jsch-0.1.49 fixed the issue for us, but then it got in conflict with hadoop's jsch version (we fixed this for us by jarjar'ing our jsch version). So i think jsch got introduce by namenode HA (HDFS-1623). So you guys should check if the ssh part is properly working for java7 or preventively upgrade the jsch lib to jsch-0.1.51! Some references to problems reported: - http://sourceforge.net/p/jsch/mailman/jsch-users/thread/loom.20131009t211650-...@post.gmane.org/ - https://issues.apache.org/bugzilla/show_bug.cgi?id=53437 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently
[ https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201958#comment-14201958 ] Hudson commented on HDFS-7221: -- FAILURE: Integrated in Hadoop-Yarn-trunk #736 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/736/]) HDFS-7221. Update CHANGES.txt to indicate fix in 2.6.0. (cnauroth: rev e7f1c0482e5dff8a1549ace1fc2b366941170c58) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestDNFencingWithReplication fails consistently --- Key: HDFS-7221 URL: https://issues.apache.org/jira/browse/HDFS-7221 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch, HDFS-7221.003.patch, HDFS-7221.004.patch, HDFS-7221.005.patch TestDNFencingWithReplication consistently fails with a timeout, both in jenkins runs and on my local machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7364) Balancer always shows zero Bytes Already Moved
[ https://issues.apache.org/jira/browse/HDFS-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201956#comment-14201956 ] Hudson commented on HDFS-7364: -- FAILURE: Integrated in Hadoop-Yarn-trunk #736 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/736/]) HDFS-7364. Balancer always shows zero Bytes Already Moved. Contributed by Tsz Wo Nicholas Sze. (jing9: rev ae71a671a3b4b454aa393c2974b6f1f16dd61405) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java Balancer always shows zero Bytes Already Moved -- Key: HDFS-7364 URL: https://issues.apache.org/jira/browse/HDFS-7364 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.6.0 Attachments: h7364_20141105.patch, h7364_20141106.patch Here is an example: {noformat} Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved Nov 5, 2014 5:23:38 PM0 0 B 116.82 MB 181.07 MB Nov 5, 2014 5:24:30 PM1 0 B88.05 MB 181.07 MB Nov 5, 2014 5:25:10 PM2 0 B73.08 MB 181.07 MB Nov 5, 2014 5:25:49 PM3 0 B13.37 MB 90.53 MB Nov 5, 2014 5:26:30 PM4 0 B13.59 MB 90.53 MB Nov 5, 2014 5:27:12 PM5 0 B 9.25 MB 90.53 MB The cluster is balanced. Exiting... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7365) Remove hdfs.server.blockmanagement.MutableBlockCollection
[ https://issues.apache.org/jira/browse/HDFS-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201955#comment-14201955 ] Hudson commented on HDFS-7365: -- FAILURE: Integrated in Hadoop-Yarn-trunk #736 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/736/]) HDFS-7365. Remove hdfs.server.blockmanagement.MutableBlockCollection. Contributed by Li Lu. (wheat9: rev 75b820cca9d4e709b9e8d40635ff0406528ad4ba) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/MutableBlockCollection.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Remove hdfs.server.blockmanagement.MutableBlockCollection - Key: HDFS-7365 URL: https://issues.apache.org/jira/browse/HDFS-7365 Project: Hadoop HDFS Issue Type: Improvement Reporter: Li Lu Assignee: Li Lu Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7365-110514.patch Seems like this component is no longer referenced. Is it OK to fully remove it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201949#comment-14201949 ] Hudson commented on HDFS-7226: -- FAILURE: Integrated in Hadoop-Yarn-trunk #736 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/736/]) HDFS-7226. Update CHANGES.txt to indicate fix in 2.6.0. (cnauroth: rev d026f3676278e24d7032dced5f14b52dec70b987) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestDNFencing.testQueueingWithAppend failed often in latest test Key: HDFS-7226 URL: https://issues.apache.org/jira/browse/HDFS-7226 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.6.0 Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch, HDFS-7226.003.patch Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7378) There should be a method to quickly test if a file is a SequenceFile or if a stream contains SequenceFile data
Jens Rabe created HDFS-7378: --- Summary: There should be a method to quickly test if a file is a SequenceFile or if a stream contains SequenceFile data Key: HDFS-7378 URL: https://issues.apache.org/jira/browse/HDFS-7378 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jens Rabe Priority: Trivial Currently, to check whether a file is a SequenceFile or a stream contains data in SequenceFile format, one either has to check the message of the thrown exception when opening a file with SequenceFile.Reader, or has to check the first four bytes by him/herself. A utility method like SequenceFile.isSequenceFile would be very handy here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7377) [ DataNode Web UI ] Metric page will not display anything..
[ https://issues.apache.org/jira/browse/HDFS-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned HDFS-7377: -- Assignee: Brahma Reddy Battula [ DataNode Web UI ] Metric page will not display anything.. --- Key: HDFS-7377 URL: https://issues.apache.org/jira/browse/HDFS-7377 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical Scenario : == Go to http://DN_IP:http port/dataNodeHome.jsp and click on metrics link.. we will not able to see any thing.. Did not find reason..do we need to implent metric page..? Checked the HDFS-2933,but did not find any clue... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7377) [ DataNode Web UI ] Metric page will not display anything..
[ https://issues.apache.org/jira/browse/HDFS-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202034#comment-14202034 ] Brahma Reddy Battula commented on HDFS-7377: As format will come as null when we access from http://DN_IP:http port/dataNodeHome.jsp, resonse will be null...Please check the following code for same.. {code} public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { if (!HttpServer2.isInstrumentationAccessAllowed(getServletContext(), request, response)) { return; } String format = request.getParameter(format); CollectionMetricsContext allContexts = ContextFactory.getFactory().getAllContexts(); if (json.equals(format)) { response.setContentType(application/json; charset=utf-8); PrintWriter out = response.getWriter(); try { // Uses Jetty's built-in JSON support to convert the map into JSON. out.print(new JSON().toJSON(makeMap(allContexts))); } finally { out.close(); } } else { PrintWriter out = response.getWriter(); try { printMap(out, makeMap(allContexts)); } finally { out.close(); } } } {code} can we add like blow..as we are handling the json format... {code} if (null == format) { format = FORMAT_JSON; } {code} [ DataNode Web UI ] Metric page will not display anything.. --- Key: HDFS-7377 URL: https://issues.apache.org/jira/browse/HDFS-7377 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Priority: Critical Scenario : == Go to http://DN_IP:http port/dataNodeHome.jsp and click on metrics link.. we will not able to see any thing.. Did not find reason..do we need to implent metric page..? Checked the HDFS-2933,but did not find any clue... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7377) [ DataNode Web UI ] Metric page will not display anything..
[ https://issues.apache.org/jira/browse/HDFS-7377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202049#comment-14202049 ] Brahma Reddy Battula commented on HDFS-7377: Problem is contextMap having null value...:). {code} CollectionMetricsContext allContexts = ContextFactory.getFactory().getAllContexts(); {code} [ DataNode Web UI ] Metric page will not display anything.. --- Key: HDFS-7377 URL: https://issues.apache.org/jira/browse/HDFS-7377 Project: Hadoop HDFS Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Brahma Reddy Battula Priority: Critical Scenario : == Go to http://DN_IP:http port/dataNodeHome.jsp and click on metrics link.. we will not able to see any thing.. Did not find reason..do we need to implent metric page..? Checked the HDFS-2933,but did not find any clue... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7331) Add Datanode network counts to datanode jmx page
[ https://issues.apache.org/jira/browse/HDFS-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7331: --- Attachment: HDFS-7331.003.patch [~wheat9], [~atm], Thanks for the comments and suggestions. Attached is the .003 patch which removes the servlet, defaults the size of the bounded cache to Int.MAX_VALUE, and uses only the hostname/address (without the port) as the key to the cache. Add Datanode network counts to datanode jmx page Key: HDFS-7331 URL: https://issues.apache.org/jira/browse/HDFS-7331 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7331.001.patch, HDFS-7331.002.patch, HDFS-7331.003.patch Add per-datanode counts to the datanode jmx page. For example, networkErrors could be exposed like this: {noformat} }, { ... DatanodeNetworkCounts : {\dn1\:{\networkErrors\:1}}, ... NamenodeAddresses : {\localhost\:\BP-1103235125-127.0.0.1-1415057084497\}, VolumeInfo : {\/tmp/hadoop-cwl/dfs/data/current\:{\freeSpace\:3092725760,\usedSpace\:28672,\reservedSpace\:0}}, ClusterId : CID-4b38f2ae-5e58-4e15-b3cf-3ba3f46e724e }, { {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202072#comment-14202072 ] Hudson commented on HDFS-7226: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1926 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1926/]) HDFS-7226. Update CHANGES.txt to indicate fix in 2.6.0. (cnauroth: rev d026f3676278e24d7032dced5f14b52dec70b987) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestDNFencing.testQueueingWithAppend failed often in latest test Key: HDFS-7226 URL: https://issues.apache.org/jira/browse/HDFS-7226 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.6.0 Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch, HDFS-7226.003.patch Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently
[ https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202081#comment-14202081 ] Hudson commented on HDFS-7221: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1926 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1926/]) HDFS-7221. Update CHANGES.txt to indicate fix in 2.6.0. (cnauroth: rev e7f1c0482e5dff8a1549ace1fc2b366941170c58) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestDNFencingWithReplication fails consistently --- Key: HDFS-7221 URL: https://issues.apache.org/jira/browse/HDFS-7221 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch, HDFS-7221.003.patch, HDFS-7221.004.patch, HDFS-7221.005.patch TestDNFencingWithReplication consistently fails with a timeout, both in jenkins runs and on my local machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7365) Remove hdfs.server.blockmanagement.MutableBlockCollection
[ https://issues.apache.org/jira/browse/HDFS-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202078#comment-14202078 ] Hudson commented on HDFS-7365: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1926 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1926/]) HDFS-7365. Remove hdfs.server.blockmanagement.MutableBlockCollection. Contributed by Li Lu. (wheat9: rev 75b820cca9d4e709b9e8d40635ff0406528ad4ba) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/MutableBlockCollection.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Remove hdfs.server.blockmanagement.MutableBlockCollection - Key: HDFS-7365 URL: https://issues.apache.org/jira/browse/HDFS-7365 Project: Hadoop HDFS Issue Type: Improvement Reporter: Li Lu Assignee: Li Lu Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7365-110514.patch Seems like this component is no longer referenced. Is it OK to fully remove it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7364) Balancer always shows zero Bytes Already Moved
[ https://issues.apache.org/jira/browse/HDFS-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202079#comment-14202079 ] Hudson commented on HDFS-7364: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1926 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1926/]) HDFS-7364. Balancer always shows zero Bytes Already Moved. Contributed by Tsz Wo Nicholas Sze. (jing9: rev ae71a671a3b4b454aa393c2974b6f1f16dd61405) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java Balancer always shows zero Bytes Already Moved -- Key: HDFS-7364 URL: https://issues.apache.org/jira/browse/HDFS-7364 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.6.0 Attachments: h7364_20141105.patch, h7364_20141106.patch Here is an example: {noformat} Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved Nov 5, 2014 5:23:38 PM0 0 B 116.82 MB 181.07 MB Nov 5, 2014 5:24:30 PM1 0 B88.05 MB 181.07 MB Nov 5, 2014 5:25:10 PM2 0 B73.08 MB 181.07 MB Nov 5, 2014 5:25:49 PM3 0 B13.37 MB 90.53 MB Nov 5, 2014 5:26:30 PM4 0 B13.59 MB 90.53 MB Nov 5, 2014 5:27:12 PM5 0 B 9.25 MB 90.53 MB The cluster is balanced. Exiting... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7365) Remove hdfs.server.blockmanagement.MutableBlockCollection
[ https://issues.apache.org/jira/browse/HDFS-7365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202152#comment-14202152 ] Hudson commented on HDFS-7365: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1950 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1950/]) HDFS-7365. Remove hdfs.server.blockmanagement.MutableBlockCollection. Contributed by Li Lu. (wheat9: rev 75b820cca9d4e709b9e8d40635ff0406528ad4ba) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/MutableBlockCollection.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Remove hdfs.server.blockmanagement.MutableBlockCollection - Key: HDFS-7365 URL: https://issues.apache.org/jira/browse/HDFS-7365 Project: Hadoop HDFS Issue Type: Improvement Reporter: Li Lu Assignee: Li Lu Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7365-110514.patch Seems like this component is no longer referenced. Is it OK to fully remove it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202146#comment-14202146 ] Hudson commented on HDFS-7226: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1950 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1950/]) HDFS-7226. Update CHANGES.txt to indicate fix in 2.6.0. (cnauroth: rev d026f3676278e24d7032dced5f14b52dec70b987) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestDNFencing.testQueueingWithAppend failed often in latest test Key: HDFS-7226 URL: https://issues.apache.org/jira/browse/HDFS-7226 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.6.0 Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch, HDFS-7226.003.patch Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently
[ https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202155#comment-14202155 ] Hudson commented on HDFS-7221: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1950 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1950/]) HDFS-7221. Update CHANGES.txt to indicate fix in 2.6.0. (cnauroth: rev e7f1c0482e5dff8a1549ace1fc2b366941170c58) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestDNFencingWithReplication fails consistently --- Key: HDFS-7221 URL: https://issues.apache.org/jira/browse/HDFS-7221 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch, HDFS-7221.003.patch, HDFS-7221.004.patch, HDFS-7221.005.patch TestDNFencingWithReplication consistently fails with a timeout, both in jenkins runs and on my local machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7364) Balancer always shows zero Bytes Already Moved
[ https://issues.apache.org/jira/browse/HDFS-7364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202153#comment-14202153 ] Hudson commented on HDFS-7364: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1950 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1950/]) HDFS-7364. Balancer always shows zero Bytes Already Moved. Contributed by Tsz Wo Nicholas Sze. (jing9: rev ae71a671a3b4b454aa393c2974b6f1f16dd61405) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Dispatcher.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java Balancer always shows zero Bytes Already Moved -- Key: HDFS-7364 URL: https://issues.apache.org/jira/browse/HDFS-7364 Project: Hadoop HDFS Issue Type: Bug Components: balancer mover Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.6.0 Attachments: h7364_20141105.patch, h7364_20141106.patch Here is an example: {noformat} Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved Nov 5, 2014 5:23:38 PM0 0 B 116.82 MB 181.07 MB Nov 5, 2014 5:24:30 PM1 0 B88.05 MB 181.07 MB Nov 5, 2014 5:25:10 PM2 0 B73.08 MB 181.07 MB Nov 5, 2014 5:25:49 PM3 0 B13.37 MB 90.53 MB Nov 5, 2014 5:26:30 PM4 0 B13.59 MB 90.53 MB Nov 5, 2014 5:27:12 PM5 0 B 9.25 MB 90.53 MB The cluster is balanced. Exiting... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202189#comment-14202189 ] Yongjun Zhang commented on HDFS-7226: - HI [~cnauroth], thanks a lot for taking care of the merge! TestDNFencing.testQueueingWithAppend failed often in latest test Key: HDFS-7226 URL: https://issues.apache.org/jira/browse/HDFS-7226 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Fix For: 2.6.0 Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch, HDFS-7226.003.patch Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context
[ https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202203#comment-14202203 ] stack commented on HDFS-6803: - Am I in the right ballpark? Thanks (Need license to hack on dfsinputstream to make it more 'live' -- thanks). Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context Key: HDFS-6803 URL: https://issues.apache.org/jira/browse/HDFS-6803 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.1 Reporter: stack Attachments: 9117.md.txt, DocumentingDFSClientDFSInputStream (1).pdf, DocumentingDFSClientDFSInputStream.v2.pdf, HDFS-6803v2.txt Reviews of the patch posted the parent task suggest that we be more explicit about how DFSIS is expected to behave when being read by contending threads. It is also suggested that presumptions made internally be made explicit documenting expectations. Before we put up a patch we've made a document of assertions we'd like to make into tenets of DFSInputSteam. If agreement, we'll attach to this issue a patch that weaves the assumptions into DFSIS as javadoc and class comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7331) Add Datanode network counts to datanode jmx page
[ https://issues.apache.org/jira/browse/HDFS-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202299#comment-14202299 ] Hadoop QA commented on HDFS-7331: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680170/HDFS-7331.003.patch against trunk revision 42bbe37. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1219 javac compiler warnings (more than the trunk's current 1218 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8689//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8689//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8689//console This message is automatically generated. Add Datanode network counts to datanode jmx page Key: HDFS-7331 URL: https://issues.apache.org/jira/browse/HDFS-7331 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7331.001.patch, HDFS-7331.002.patch, HDFS-7331.003.patch Add per-datanode counts to the datanode jmx page. For example, networkErrors could be exposed like this: {noformat} }, { ... DatanodeNetworkCounts : {\dn1\:{\networkErrors\:1}}, ... NamenodeAddresses : {\localhost\:\BP-1103235125-127.0.0.1-1415057084497\}, VolumeInfo : {\/tmp/hadoop-cwl/dfs/data/current\:{\freeSpace\:3092725760,\usedSpace\:28672,\reservedSpace\:0}}, ClusterId : CID-4b38f2ae-5e58-4e15-b3cf-3ba3f46e724e }, { {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7374) Allow decommissioning of dead DataNodes
[ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202348#comment-14202348 ] Zhe Zhang commented on HDFS-7374: - [~mingma] Thanks much for clarifying the state machine. I agree my option #2 is cleaner and makes the decommissioning of dead nodes much faster. I'll go ahead with that approach now. bq. If the node stays in Dead, DECOMMISSION_INPROGRESS for too long, have the higher layer application remove the node from exclude file and thus abort the decommission process. This will transition the node to Dead, NORMAL. The specific higher layer application in my case is Cloudera Manager and I think it's possible to add this logic. However I don't know how easy it is to change all similar management applications. bq. HDFS-6791 mentioned another way to address the original issue. When nodes become dead, mark them DECOMMISSIONED and fix the replication to handle this case. In other words, get rid of Dead, DECOMMISSION_INPROGRESS state. Do you mean allowing a {{DECOMMISSIONED}} node to be the source of a replica transfer? It seems a little fragile to me; intuitively, it could surprise upper layer applications that a {{DECOMMISSIONED}} node is still actively transferring data. But I would like to hear the opinions from other people. Allow decommissioning of dead DataNodes --- Key: HDFS-7374 URL: https://issues.apache.org/jira/browse/HDFS-7374 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang We have seen the use case of decommissioning DataNodes that are already dead or unresponsive, and not expected to rejoin the cluster. The logic introduced by HDFS-6791 will mark those nodes as {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish the decommission work. If an upper layer application is monitoring the decommissioning progress, it will hang forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7331) Add Datanode network counts to datanode jmx page
[ https://issues.apache.org/jira/browse/HDFS-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-7331: --- Attachment: HDFS-7331.004.patch .004 gets rid of the compiler warning. Add Datanode network counts to datanode jmx page Key: HDFS-7331 URL: https://issues.apache.org/jira/browse/HDFS-7331 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7331.001.patch, HDFS-7331.002.patch, HDFS-7331.003.patch, HDFS-7331.004.patch Add per-datanode counts to the datanode jmx page. For example, networkErrors could be exposed like this: {noformat} }, { ... DatanodeNetworkCounts : {\dn1\:{\networkErrors\:1}}, ... NamenodeAddresses : {\localhost\:\BP-1103235125-127.0.0.1-1415057084497\}, VolumeInfo : {\/tmp/hadoop-cwl/dfs/data/current\:{\freeSpace\:3092725760,\usedSpace\:28672,\reservedSpace\:0}}, ClusterId : CID-4b38f2ae-5e58-4e15-b3cf-3ba3f46e724e }, { {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7374) Allow decommissioning of dead DataNodes
[ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202371#comment-14202371 ] Ming Ma commented on HDFS-7374: --- Yeah, the idea was to use {{DECOMMISSIONED}} node as the source node only when there is no {{NORMAL}} node available. Agree it breaks the state definition. Allow decommissioning of dead DataNodes --- Key: HDFS-7374 URL: https://issues.apache.org/jira/browse/HDFS-7374 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang We have seen the use case of decommissioning DataNodes that are already dead or unresponsive, and not expected to rejoin the cluster. The logic introduced by HDFS-6791 will mark those nodes as {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish the decommission work. If an upper layer application is monitoring the decommissioning progress, it will hang forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202385#comment-14202385 ] Haohui Mai commented on HDFS-7279: -- The test failures are unrelated. Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7379) Fix unit test TestBalancer#testBalancerWithRamDisk failure on Windows
Xiaoyu Yao created HDFS-7379: Summary: Fix unit test TestBalancer#testBalancerWithRamDisk failure on Windows Key: HDFS-7379 URL: https://issues.apache.org/jira/browse/HDFS-7379 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure on certain windows test machines. The fix is to create file with different name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7379) Fix unit test TestBalancer#testBalancerWithRamDisk failure on Windows
[ https://issues.apache.org/jira/browse/HDFS-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7379: - Attachment: HDFS-7379.01.patch Fix unit test TestBalancer#testBalancerWithRamDisk failure on Windows - Key: HDFS-7379 URL: https://issues.apache.org/jira/browse/HDFS-7379 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7379.01.patch There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure on certain windows test machines. The fix is to create file with different name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7379) Fix unit test TestBalancer#testBalancerWithRamDisk failure on Windows
[ https://issues.apache.org/jira/browse/HDFS-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7379: - Description: There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure for path2 as it never gets created. The fix is to create test files with the correct names. (was: There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure on certain windows test machines. The fix is to create file with different name. ) Fix unit test TestBalancer#testBalancerWithRamDisk failure on Windows - Key: HDFS-7379 URL: https://issues.apache.org/jira/browse/HDFS-7379 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7379.01.patch There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure for path2 as it never gets created. The fix is to create test files with the correct names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7379) Fix unit test TestBalancer#testBalancerWithRamDisk failure
[ https://issues.apache.org/jira/browse/HDFS-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7379: - Summary: Fix unit test TestBalancer#testBalancerWithRamDisk failure (was: Fix unit test TestBalancer#testBalancerWithRamDisk failure on Windows) Fix unit test TestBalancer#testBalancerWithRamDisk failure -- Key: HDFS-7379 URL: https://issues.apache.org/jira/browse/HDFS-7379 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7379.01.patch There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure for path2 as it never gets created. The fix is to create test files with the correct names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7379) Fix unit test TestBalancer#testBalancerWithRamDisk failure
[ https://issues.apache.org/jira/browse/HDFS-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7379: - Affects Version/s: 2.6.0 Status: Patch Available (was: Open) Fix unit test TestBalancer#testBalancerWithRamDisk failure -- Key: HDFS-7379 URL: https://issues.apache.org/jira/browse/HDFS-7379 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7379.01.patch There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure for path2 as it never gets created. The fix is to create test files with the correct names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7379) Fix unit test TestBalancer#testBalancerWithRamDisk failure
[ https://issues.apache.org/jira/browse/HDFS-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202549#comment-14202549 ] Haohui Mai commented on HDFS-7379: -- +1 pending jenkins Fix unit test TestBalancer#testBalancerWithRamDisk failure -- Key: HDFS-7379 URL: https://issues.apache.org/jira/browse/HDFS-7379 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7379.01.patch There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure for path2 as it never gets created. The fix is to create test files with the correct names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7358: -- Attachment: (was: h7358_20141107.patch) Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch, h7358_20141105.patch, h7358_20141106.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7358: -- Attachment: h7358_20141107.patch Is 'State' the right name for this inner class that carries state-of-stream-close and stuff to run on close? ... Do you need this class? It can't just be a method to call on close? DFSOutputStream is big and lack of organization. It has 30+ fields in DFSOutputSteam alone, not counting inner classes such as DataStreamer. I think it is better separate to group the fields describing the state of the stream together. Since I am not going to move the other fields for the moment, let's keep closed as a field. Here is a new patch. h7358_20141107.patch Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch, h7358_20141105.patch, h7358_20141106.patch, h7358_20141107.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7358: -- Attachment: h7358_20141107.patch Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch, h7358_20141105.patch, h7358_20141106.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7358: -- Attachment: (was: h7358_20141107.patch) Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch, h7358_20141105.patch, h7358_20141106.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7358: -- Attachment: h7358_20141107.patch Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch, h7358_20141105.patch, h7358_20141106.patch, h7358_20141107.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202635#comment-14202635 ] Tsz Wo Nicholas Sze commented on HDFS-7358: --- [~stack], have you changed dfs.bytes-per-checksum in your test? What is version of Hadoop? {code} 2014-11-04 16:55:57,202 DEBUG [sync.0] util.ByteArrayManager: allocate(65565): count=60367, aboveThreshold, [131072: 9998/1, free=1], recycled? true {code} I wonder why it allocates a 65565 ( 64kB) array. See also HDFS-7308. Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch, h7358_20141105.patch, h7358_20141106.patch, h7358_20141107.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7331) Add Datanode network counts to datanode jmx page
[ https://issues.apache.org/jira/browse/HDFS-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202686#comment-14202686 ] Hadoop QA commented on HDFS-7331: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680207/HDFS-7331.004.patch against trunk revision 1e97f2f. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestHDFSConcat {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8690//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8690//console This message is automatically generated. Add Datanode network counts to datanode jmx page Key: HDFS-7331 URL: https://issues.apache.org/jira/browse/HDFS-7331 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7331.001.patch, HDFS-7331.002.patch, HDFS-7331.003.patch, HDFS-7331.004.patch Add per-datanode counts to the datanode jmx page. For example, networkErrors could be exposed like this: {noformat} }, { ... DatanodeNetworkCounts : {\dn1\:{\networkErrors\:1}}, ... NamenodeAddresses : {\localhost\:\BP-1103235125-127.0.0.1-1415057084497\}, VolumeInfo : {\/tmp/hadoop-cwl/dfs/data/current\:{\freeSpace\:3092725760,\usedSpace\:28672,\reservedSpace\:0}}, ClusterId : CID-4b38f2ae-5e58-4e15-b3cf-3ba3f46e724e }, { {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7379) Fix unit test TestBalancer#testBalancerWithRamDisk failure
[ https://issues.apache.org/jira/browse/HDFS-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202685#comment-14202685 ] Hadoop QA commented on HDFS-7379: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680234/HDFS-7379.01.patch against trunk revision 2ac1be7. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-hdfs-project/hadoop-hdfs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8691//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8691//console This message is automatically generated. Fix unit test TestBalancer#testBalancerWithRamDisk failure -- Key: HDFS-7379 URL: https://issues.apache.org/jira/browse/HDFS-7379 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7379.01.patch There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure for path2 as it never gets created. The fix is to create test files with the correct names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7379) Fix unit test TestBalancer#testBalancerWithRamDisk failure
[ https://issues.apache.org/jira/browse/HDFS-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202691#comment-14202691 ] Haohui Mai commented on HDFS-7379: -- The test failures are unrelated. I'll commit this shortly. Fix unit test TestBalancer#testBalancerWithRamDisk failure -- Key: HDFS-7379 URL: https://issues.apache.org/jira/browse/HDFS-7379 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Attachments: HDFS-7379.01.patch There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure for path2 as it never gets created. The fix is to create test files with the correct names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7379) Fix unit test TestBalancer#testBalancerWithRamDisk failure
[ https://issues.apache.org/jira/browse/HDFS-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7379: -- Component/s: (was: datanode) test Priority: Minor (was: Major) Fix unit test TestBalancer#testBalancerWithRamDisk failure -- Key: HDFS-7379 URL: https://issues.apache.org/jira/browse/HDFS-7379 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Priority: Minor Attachments: HDFS-7379.01.patch There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure for path2 as it never gets created. The fix is to create test files with the correct names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7379) TestBalancer#testBalancerWithRamDisk creates test files incorrectly
[ https://issues.apache.org/jira/browse/HDFS-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7379: - Summary: TestBalancer#testBalancerWithRamDisk creates test files incorrectly (was: Fix unit test TestBalancer#testBalancerWithRamDisk failure) TestBalancer#testBalancerWithRamDisk creates test files incorrectly --- Key: HDFS-7379 URL: https://issues.apache.org/jira/browse/HDFS-7379 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Priority: Minor Attachments: HDFS-7379.01.patch There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure for path2 as it never gets created. The fix is to create test files with the correct names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7379) TestBalancer#testBalancerWithRamDisk creates test files incorrectly
[ https://issues.apache.org/jira/browse/HDFS-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7379: - Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed the patch to trunk, branch-2 and branch-2.6. Thanks [~xyao] for the contribution. TestBalancer#testBalancerWithRamDisk creates test files incorrectly --- Key: HDFS-7379 URL: https://issues.apache.org/jira/browse/HDFS-7379 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7379.01.patch There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure for path2 as it never gets created. The fix is to create test files with the correct names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7379) TestBalancer#testBalancerWithRamDisk creates test files incorrectly
[ https://issues.apache.org/jira/browse/HDFS-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202744#comment-14202744 ] Hudson commented on HDFS-7379: -- FAILURE: Integrated in Hadoop-trunk-Commit #6484 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6484/]) HDFS-7379. TestBalancer#testBalancerWithRamDisk creates test files incorrectly. Contributed by Xiaoyu Yao. (wheat9: rev 57760c0663288a7611c3609891ef92f1abf4bb53) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt TestBalancer#testBalancerWithRamDisk creates test files incorrectly --- Key: HDFS-7379 URL: https://issues.apache.org/jira/browse/HDFS-7379 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7379.01.patch There is a copy-paste error during for test file creation. The test is supposed to create two test files named path1 and path2 on RAM_DISK. But The error caused path1 to be created twice with the second creation overwrite (delete) the first one on RAM_DISK. This caused verification failure for path2 as it never gets created. The fix is to create test files with the correct names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7380) unsteady and slow performance when writing to file with block size 2GB
Adam Fuchs created HDFS-7380: Summary: unsteady and slow performance when writing to file with block size 2GB Key: HDFS-7380 URL: https://issues.apache.org/jira/browse/HDFS-7380 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Adam Fuchs Attachments: BenchmarkWrites.java Appending to a large file with block size 2GB can lead to periods of really poor performance (4X slower than optimal). I found this issue when looking at Accmulo write performance in ACCUMULO-3303. I wrote a small test application to isolate this performance down to some basic API calls (to be attached). A description of the execution can be found here: https://issues.apache.org/jira/browse/ACCUMULO-3303?focusedCommentId=14202830page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14202830 The specific hadoop version was as follows: {code} [root@n1 ~]# hadoop version Hadoop 2.4.0.2.1.2.0-402 Subversion g...@github.com:hortonworks/hadoop.git -r 9e5db004df1a751e93aa89b42956c5325f3a4482 Compiled by jenkins on 2014-04-27T22:28Z Compiled with protoc 2.5.0 From source with checksum 9e788148daa5dd7934eb468e57e037b5 This command was run using /usr/lib/hadoop/hadoop-common-2.4.0.2.1.2.0-402.jar {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7380) unsteady and slow performance when writing to file with block size 2GB
[ https://issues.apache.org/jira/browse/HDFS-7380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Fuchs updated HDFS-7380: - Attachment: BenchmarkWrites.java unsteady and slow performance when writing to file with block size 2GB --- Key: HDFS-7380 URL: https://issues.apache.org/jira/browse/HDFS-7380 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Adam Fuchs Attachments: BenchmarkWrites.java Appending to a large file with block size 2GB can lead to periods of really poor performance (4X slower than optimal). I found this issue when looking at Accmulo write performance in ACCUMULO-3303. I wrote a small test application to isolate this performance down to some basic API calls (to be attached). A description of the execution can be found here: https://issues.apache.org/jira/browse/ACCUMULO-3303?focusedCommentId=14202830page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14202830 The specific hadoop version was as follows: {code} [root@n1 ~]# hadoop version Hadoop 2.4.0.2.1.2.0-402 Subversion g...@github.com:hortonworks/hadoop.git -r 9e5db004df1a751e93aa89b42956c5325f3a4482 Compiled by jenkins on 2014-04-27T22:28Z Compiled with protoc 2.5.0 From source with checksum 9e788148daa5dd7934eb468e57e037b5 This command was run using /usr/lib/hadoop/hadoop-common-2.4.0.2.1.2.0-402.jar {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7040) HDFS dangerously uses @Beta methods from very old versions of Guava
[ https://issues.apache.org/jira/browse/HDFS-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202856#comment-14202856 ] Christopher Tubbs commented on HDFS-7040: - I added a patch to fix this under MAPREDUCE-6083 for versions 2.6.0 and later which doesn't change the Guava version dependency. I suppose it could be back-ported to earlier versions (2.4/2.5), but it's probably not worth it since those versions are really only affected by {{MiniDFSCluster}}, and that's very limited. HDFS dangerously uses @Beta methods from very old versions of Guava --- Key: HDFS-7040 URL: https://issues.apache.org/jira/browse/HDFS-7040 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0, 2.5.0, 2.4.1 Reporter: Christopher Tubbs Labels: beta, deprecated, guava Attachments: 0001-HDFS-7040-Avoid-beta-LimitInputStream-in-Guava.patch HDFS uses LimitInputStream from Guava. This was introduced as @Beta and is risky for any application to use. The problem is further exacerbated by Hadoop's dependency on Guava version 11.0.2, which is quite old for an active project (Feb. 2012). Because Guava is very stable, projects which depend on Hadoop and use Guava themselves, can use up through Guava version 14.x However, in version 14, Guava deprecated LimitInputStream and provided a replacement. Because they make no guarantees about compatibility about @Beta classes, they removed it in version 15. What should be done: Hadoop should updated its dependency on Guava to at least version 14 (currently Guava is on version 19). This should have little impact on users, because Guava is so stable. HDFS should then be patched to use the provided alternative to LimitInputStream, so that downstream packagers, users, and application developers requiring more recent versions of Guava (to fix bugs, to use new features, etc.) will be able to swap out the Guava dependency without breaking Hadoop. Alternative: While Hadoop cannot predict the marking and removal of deprecated code, it can, and should, avoid the use of @Beta classes and methods that do not offer guarantees. If the dependency cannot be bumped, then it should be relatively trivial to provide an internal class with the same functionality, that does not rely on the older version of Guava. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202890#comment-14202890 ] Hadoop QA commented on HDFS-7358: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680257/h7358_20141107.patch against trunk revision 06b7979. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancer The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestParallelShortCircuitReadUnCached {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8693//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8693//console This message is automatically generated. Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch, h7358_20141105.patch, h7358_20141106.patch, h7358_20141107.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7381) Decouple the management of block id and gen stamps from FSNamesystem
Haohui Mai created HDFS-7381: Summary: Decouple the management of block id and gen stamps from FSNamesystem Key: HDFS-7381 URL: https://issues.apache.org/jira/browse/HDFS-7381 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai The block layer should be responsible of managing block ids and generation stamps. Currently the functionality is misplace into {{FSNamesystem}}. This jira proposes to decouple them from the {{FSNamesystem}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7381) Decouple the management of block id and gen stamps from FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7381: - Attachment: HDFS-7381.000.patch Decouple the management of block id and gen stamps from FSNamesystem Key: HDFS-7381 URL: https://issues.apache.org/jira/browse/HDFS-7381 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7381.000.patch The block layer should be responsible of managing block ids and generation stamps. Currently the functionality is misplace into {{FSNamesystem}}. This jira proposes to decouple them from the {{FSNamesystem}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7381) Decouple the management of block id and gen stamps from FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202922#comment-14202922 ] Haohui Mai commented on HDFS-7381: -- The v1 patch creates a new class ({{BlockIdManager}} in the {{blockmanagement}} package) to manage the block ids and generation stamps. {{FSNamesystem}} is still responsible to persist the latest generation stamp and block id in the edit logs. Decouple the management of block id and gen stamps from FSNamesystem Key: HDFS-7381 URL: https://issues.apache.org/jira/browse/HDFS-7381 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7381.000.patch The block layer should be responsible of managing block ids and generation stamps. Currently the functionality is misplace into {{FSNamesystem}}. This jira proposes to decouple them from the {{FSNamesystem}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202894#comment-14202894 ] Tsz Wo Nicholas Sze commented on HDFS-7279: --- Reviewing the patch. Some questions/comments from the first half: - In JspHelper.checkUsername(..), why removing the tryUgiParameter if-statement? - In URLDispatcher.channelRead0(..), how about checking webhdfs uri first and then use SimpleHttpProxyHandler for every else? I.e. {code} if (uri.startsWith(/webhdfs/v1)) { WebHdfsHandler h = new WebHdfsHandler(conf, confForCreate); p.replace(this, proxy, h); h.channelRead0(ctx, req); } else { SimpleHttpProxyHandler h = new SimpleHttpProxyHandler(proxyHost); p.replace(this, proxy, h); h.channelRead0(ctx, req); } {code} - DatanodeHttpServer.close() should throw IOException. Then, we don't need to convert IOException to RuntimeException. Also, do we want to distory ssl factory before closing the channel? Or put it in finally? - In SimpleHttpProxyHandler, -* Forwarder.channelRead(..): should the two LOG.warn be LOG.debug? -* Forwarder.exceptionCaught(..): should the LOG.info be LOG.warn/error? -* channelRead0(..): should the LOG.info be LOG.warn/error? Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202938#comment-14202938 ] Haohui Mai commented on HDFS-7279: -- Updated the patch to address Nicholas's comments. I cleaned up the usages of LOG {{SimpleHttpProxyHandler}} in the v8 patch. I kept the LOG at INFO level when an exception occurs. My intuition is that it is usually not a serious issue when this type of error happens, thus making them WARN might generate unnecessary noise. I have no strong opinion on that, I'm okay to change it to WARN if you think it is more appropriate. Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch, HDFS-7279.008.patch Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7381) Decouple the management of block id and gen stamps from FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7381: - Status: Patch Available (was: Open) Decouple the management of block id and gen stamps from FSNamesystem Key: HDFS-7381 URL: https://issues.apache.org/jira/browse/HDFS-7381 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7381.000.patch The block layer should be responsible of managing block ids and generation stamps. Currently the functionality is misplace into {{FSNamesystem}}. This jira proposes to decouple them from the {{FSNamesystem}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-7279: - Attachment: HDFS-7279.008.patch Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch, HDFS-7279.008.patch Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7358) Clients may get stuck waiting when using ByteArrayManager
[ https://issues.apache.org/jira/browse/HDFS-7358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202953#comment-14202953 ] Hadoop QA commented on HDFS-7358: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680257/h7358_20141107.patch against trunk revision 06b7979. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestHFlush {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8692//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8692//console This message is automatically generated. Clients may get stuck waiting when using ByteArrayManager - Key: HDFS-7358 URL: https://issues.apache.org/jira/browse/HDFS-7358 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7358_20141104.patch, h7358_20141104_wait_timeout.patch, h7358_20141105.patch, h7358_20141106.patch, h7358_20141107.patch [~stack] reported that clients might get stuck waiting when using ByteArrayManager; see [his comments|https://issues.apache.org/jira/browse/HDFS-7276?focusedCommentId=14197036page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14197036]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN
[ https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202971#comment-14202971 ] Colin Patrick McCabe commented on HDFS-7314: bq. It turns out a new bug not related to this was discovered by this change. If DataStreamer thread exit and closes the stream before application closes the stream, DFSClient will keep renewing the lease. That is because DataStreamer's closeInternal marks the stream closed but didn't call DFSClient's endFileLease. Later when application closes the stream, it will skip DFSClient's endFileLease given the stream has been closed. You're right that there is a bug here. There is a lot of discussion about what to do about this issue in HDFS-4504. It's not as simple as just calling {{endFileLease}}... if we missed calling {{completeFile}}, the NN will continue to think that we have a lease open on this file. I think we should avoid modifying {{DFSOutputStream#close}} here. We should try to keep this JIRA focused on just the description. Plus HDFS-4504 is a complex issue, not easy to solve. {{TestDFSClientRetries.java}}: let's get rid of the unnecessary whitespace change in the current patch. I like the idea of getting rid of the {{DFSClient#abort}} function. The patch looks good once these things are removed, should be ready to go soon! Aborted DFSClient's impact on long running service like YARN Key: HDFS-7314 URL: https://issues.apache.org/jira/browse/HDFS-7314 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-7314-2.patch, HDFS-7314-3.patch, HDFS-7314-4.patch, HDFS-7314.patch It happened in YARN nodemanger scenario. But it could happen to any long running service that use cached instance of DistrbutedFileSystem. 1. Active NN is under heavy load. So it became unavailable for 10 minutes; any DFSClient request will get ConnectTimeoutException. 2. YARN nodemanager use DFSClient for certain write operation such as log aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's renewLease RPC got ConnectTimeoutException. {noformat} 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds. Aborting ... {noformat} 3. After DFSClient is in Aborted state, YARN NM can't use that cached instance of DistributedFileSystem. {noformat} 2014-10-29 20:26:23,991 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc... java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} We can make YARN or DFSClient more tolerant to temporary NN unavailability. Given the callstack is YARN - DistributedFileSystem - DFSClient, this can be addressed at different layers. * YARN closes the DistributedFileSystem object when it receives some well defined exception. Then the next HDFS call will create a new instance of DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS applications need to address this as well. * DistributedFileSystem detects Aborted DFSClient and create a new instance of DFSClient. We will need to fix all the places DistributedFileSystem calls DFSClient. * After DFSClient gets into Aborted state, it doesn't have to reject all requests , instead it can retry. If NN is available again it can transition to healthy state. Comments? -- This message was sent by Atlassian JIRA
[jira] [Commented] (HDFS-6803) Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context
[ https://issues.apache.org/jira/browse/HDFS-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202999#comment-14202999 ] Colin Patrick McCabe commented on HDFS-6803: I'm having trouble reconciling the idea that input streams are not thread-safe with the idea that multiple positional reads can be going on in parallel. It seems like if clients are going to have multiple {{pread}} calls in flight, they are counting on thread-safety. Maybe we can say that streams which implement {{PositionedReadable}} are thread-safe? Are there still Hadoop FileSystem implementations out there that have input streams that are not thread-safe? That seems like a recipe for broken code that runs on HDFS but not on other FSes. It seems like if we do have any such FS implemntations, they could be fixed pretty easily by putting {{synchronized}} on the methods. Documenting DFSClient#DFSInputStream expectations reading and preading in concurrent context Key: HDFS-6803 URL: https://issues.apache.org/jira/browse/HDFS-6803 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.1 Reporter: stack Attachments: 9117.md.txt, DocumentingDFSClientDFSInputStream (1).pdf, DocumentingDFSClientDFSInputStream.v2.pdf, HDFS-6803v2.txt Reviews of the patch posted the parent task suggest that we be more explicit about how DFSIS is expected to behave when being read by contending threads. It is also suggested that presumptions made internally be made explicit documenting expectations. Before we put up a patch we've made a document of assertions we'd like to make into tenets of DFSInputSteam. If agreement, we'll attach to this issue a patch that weaves the assumptions into DFSIS as javadoc and class comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7279) Use netty to implement DatanodeWebHdfsMethods
[ https://issues.apache.org/jira/browse/HDFS-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203170#comment-14203170 ] Hadoop QA commented on HDFS-7279: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680308/HDFS-7279.008.patch against trunk revision c3d4750. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.util.TestByteArrayManager org.apache.hadoop.hdfs.TestParallelUnixDomainRead {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8695//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8695//console This message is automatically generated. Use netty to implement DatanodeWebHdfsMethods - Key: HDFS-7279 URL: https://issues.apache.org/jira/browse/HDFS-7279 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7279.000.patch, HDFS-7279.001.patch, HDFS-7279.002.patch, HDFS-7279.003.patch, HDFS-7279.004.patch, HDFS-7279.005.patch, HDFS-7279.006.patch, HDFS-7279.007.patch, HDFS-7279.008.patch Currently the DN implements all related webhdfs functionality using jetty. As the current jetty version the DN used (jetty 6) lacks of fine-grained buffer and connection management, DN often suffers from long latency and OOM when its webhdfs component is under sustained heavy load. This jira proposes to implement the webhdfs component in DN using netty, which can be more efficient and allow more finer-grain controls on webhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7381) Decouple the management of block id and gen stamps from FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203171#comment-14203171 ] Hadoop QA commented on HDFS-7381: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680304/HDFS-7381.000.patch against trunk revision c3d4750. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The applied patch generated 1218 javac compiler warnings (more than the trunk's current 1217 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.util.TestByteArrayManager org.apache.hadoop.hdfs.TestParallelUnixDomainRead {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8694//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8694//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8694//console This message is automatically generated. Decouple the management of block id and gen stamps from FSNamesystem Key: HDFS-7381 URL: https://issues.apache.org/jira/browse/HDFS-7381 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-7381.000.patch The block layer should be responsible of managing block ids and generation stamps. Currently the functionality is misplace into {{FSNamesystem}}. This jira proposes to decouple them from the {{FSNamesystem}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN
[ https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ming Ma updated HDFS-7314: -- Attachment: HDFS-7314-5.patch Thanks, Colin. Didn't know lease leak is a known issue. Here is the updated patch. Given the lease leak issue, LeaseRenewal can't rely on {{closeAllFilesBeingWritten}} to close all leases. So it has to call {{CloseClient}}. {{testLeaseRenewSocketTimeout}} added to {{TestDFSClientRetries}} doesn't seem to have unnecessary whitespace. Do you mean newline? The updated patch has removed unnecessary newlines. Aborted DFSClient's impact on long running service like YARN Key: HDFS-7314 URL: https://issues.apache.org/jira/browse/HDFS-7314 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-7314-2.patch, HDFS-7314-3.patch, HDFS-7314-4.patch, HDFS-7314-5.patch, HDFS-7314.patch It happened in YARN nodemanger scenario. But it could happen to any long running service that use cached instance of DistrbutedFileSystem. 1. Active NN is under heavy load. So it became unavailable for 10 minutes; any DFSClient request will get ConnectTimeoutException. 2. YARN nodemanager use DFSClient for certain write operation such as log aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's renewLease RPC got ConnectTimeoutException. {noformat} 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds. Aborting ... {noformat} 3. After DFSClient is in Aborted state, YARN NM can't use that cached instance of DistributedFileSystem. {noformat} 2014-10-29 20:26:23,991 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc... java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} We can make YARN or DFSClient more tolerant to temporary NN unavailability. Given the callstack is YARN - DistributedFileSystem - DFSClient, this can be addressed at different layers. * YARN closes the DistributedFileSystem object when it receives some well defined exception. Then the next HDFS call will create a new instance of DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS applications need to address this as well. * DistributedFileSystem detects Aborted DFSClient and create a new instance of DFSClient. We will need to fix all the places DistributedFileSystem calls DFSClient. * After DFSClient gets into Aborted state, it doesn't have to reject all requests , instead it can retry. If NN is available again it can transition to healthy state. Comments? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7374) Allow decommissioning of dead DataNodes
[ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7374: Status: Patch Available (was: Open) Allow decommissioning of dead DataNodes --- Key: HDFS-7374 URL: https://issues.apache.org/jira/browse/HDFS-7374 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang We have seen the use case of decommissioning DataNodes that are already dead or unresponsive, and not expected to rejoin the cluster. The logic introduced by HDFS-6791 will mark those nodes as {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish the decommission work. If an upper layer application is monitoring the decommissioning progress, it will hang forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7374) Allow decommissioning of dead DataNodes
[ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7374: Attachment: HDFS-7374-001.patch Thanks [~mingma] again for the comment. This patch implements option #2. It also moved an utility function to {{DFSTestUtil}} so it's accessible in the new unit test. Allow decommissioning of dead DataNodes --- Key: HDFS-7374 URL: https://issues.apache.org/jira/browse/HDFS-7374 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-7374-001.patch We have seen the use case of decommissioning DataNodes that are already dead or unresponsive, and not expected to rejoin the cluster. The logic introduced by HDFS-6791 will mark those nodes as {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish the decommission work. If an upper layer application is monitoring the decommissioning progress, it will hang forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7374) Allow decommissioning of dead DataNodes
[ https://issues.apache.org/jira/browse/HDFS-7374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203257#comment-14203257 ] Hadoop QA commented on HDFS-7374: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680362/HDFS-7374-001.patch against trunk revision 9a4e0d3. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-hdfs-project/hadoop-hdfs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8697//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8697//console This message is automatically generated. Allow decommissioning of dead DataNodes --- Key: HDFS-7374 URL: https://issues.apache.org/jira/browse/HDFS-7374 Project: Hadoop HDFS Issue Type: Bug Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-7374-001.patch We have seen the use case of decommissioning DataNodes that are already dead or unresponsive, and not expected to rejoin the cluster. The logic introduced by HDFS-6791 will mark those nodes as {{DECOMMISSION_INPROGRESS}}, with a hope that they can come back and finish the decommission work. If an upper layer application is monitoring the decommissioning progress, it will hang forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7314) Aborted DFSClient's impact on long running service like YARN
[ https://issues.apache.org/jira/browse/HDFS-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203256#comment-14203256 ] Hadoop QA commented on HDFS-7314: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680355/HDFS-7314-5.patch against trunk revision 4a114dd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8696//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8696//console This message is automatically generated. Aborted DFSClient's impact on long running service like YARN Key: HDFS-7314 URL: https://issues.apache.org/jira/browse/HDFS-7314 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Ming Ma Attachments: HDFS-7314-2.patch, HDFS-7314-3.patch, HDFS-7314-4.patch, HDFS-7314-5.patch, HDFS-7314.patch It happened in YARN nodemanger scenario. But it could happen to any long running service that use cached instance of DistrbutedFileSystem. 1. Active NN is under heavy load. So it became unavailable for 10 minutes; any DFSClient request will get ConnectTimeoutException. 2. YARN nodemanager use DFSClient for certain write operation such as log aggregator or shared cache in YARN-1492. DFSClient used by YARN NM's renewLease RPC got ConnectTimeoutException. {noformat} 2014-10-29 01:36:19,559 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_-550838118_1] for 372 seconds. Aborting ... {noformat} 3. After DFSClient is in Aborted state, YARN NM can't use that cached instance of DistributedFileSystem. {noformat} 2014-10-29 20:26:23,991 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Failed to download rsrc... java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:727) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1780) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:237) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:340) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:57) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {noformat} We can make YARN or DFSClient more tolerant to temporary NN unavailability. Given the callstack is YARN - DistributedFileSystem - DFSClient, this can be addressed at different layers. * YARN closes the DistributedFileSystem object when it receives some well defined exception. Then the next HDFS call will create a new instance of DistributedFileSystem. We have to fix all the places in YARN. Plus other HDFS applications need to address this as well. * DistributedFileSystem detects Aborted DFSClient and create a new instance of DFSClient. We will need to fix all the places DistributedFileSystem calls DFSClient. * After DFSClient gets into Aborted state, it doesn't have to reject all requests , instead it can