[jira] [Updated] (HDFS-6261) Document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6261: Attachment: HDFS-6261.010.patch Thanks for the review Junping! I change nodes to node groups, that should have not conflict with No duplicated replicas are on the same node or node group, and still simple enough to avoid misunderstanding? {noformat} The remaining replicas are placed randomly across other node groups {noformat} Document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, HDFS-6261.005.patch, HDFS-6261.006.patch, HDFS-6261.007.patch, HDFS-6261.008.patch, HDFS-6261.009.patch, HDFS-6261.010.patch, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6261) Document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6261: Attachment: HDFS-6261.008.patch Thanks for the detailed review, nice comments. I made some modifications according to your comments. Document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, HDFS-6261.005.patch, HDFS-6261.006.patch, HDFS-6261.007.patch, HDFS-6261.008.patch, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6261) Document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6261: Attachment: HDFS-6261.009.patch remove tailing whitespace Document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, HDFS-6261.005.patch, HDFS-6261.006.patch, HDFS-6261.007.patch, HDFS-6261.008.patch, HDFS-6261.009.patch, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6261) Add document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6261: Attachment: HDFS-6261.007.patch Thanks Allen. Update the patch. Add document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, HDFS-6261.005.patch, HDFS-6261.006.patch, HDFS-6261.007.patch, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6261) Add document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6261: Attachment: HDFS-6261.006.patch Add document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, HDFS-6261.005.patch, HDFS-6261.006.patch, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6261) Add document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6261: Status: Patch Available (was: Open) Add document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6261) Add document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6261: Attachment: HDFS-6261.004.patch Add document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6261) Add document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6261: Attachment: HDFS-6261.005.patch Remove binary file in patch, the 2 png images(2-layer-topology.png, 3-layer-topology.png) should put into hadoop-common-project/hadoop-common/src/site/resources/images/ Add document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, HDFS-6261.005.patch, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555460#comment-14555460 ] Binglin Chang commented on HDFS-6261: - Sorry... will see if I can get this done this weekend. Add document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516940#comment-14516940 ] Binglin Chang commented on HDFS-5574: - Strange, the test error is caused by NoSuchMethodError, which should not happen if code is compiled successfully, is there any bug in test-patch process? {code} java.lang.NoSuchMethodError: org.apache.hadoop.fs.FSInputChecker.readAndDiscard(I)I at org.apache.hadoop.hdfs.RemoteBlockReader.read(RemoteBlockReader.java:128) at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:740) at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:796) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:856) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:899) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:700) at org.apache.hadoop.hdfs.TestDFSInputStream.testSkipInner(TestDFSInputStream.java:61) at org.apache.hadoop.hdfs.TestDFSInputStream.testSkipWithRemoteBlockReader(TestDFSInputStream.java:76) {code} Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.006.patch, HDFS-5574.007.patch, HDFS-5574.008.patch, HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-5574: Attachment: HDFS-5574.008.patch Oops, sorry I forgot this, attach new patch. Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.006.patch, HDFS-5574.007.patch, HDFS-5574.008.patch, HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-5574: Attachment: HDFS-5574.007.patch Thanks for the review Akira. Update the patch to fix compile and stylecheck warnings Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.006.patch, HDFS-5574.007.patch, HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377332#comment-14377332 ] Binglin Chang commented on HDFS-6261: - Sorry for the late, will update the patch soon. Add document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364770#comment-14364770 ] Binglin Chang commented on HDFS-7471: - Hi [~szetszwo], I think the main concern of the patch is it may hide race condition problem, I see in the code the state is periodically refreshed by countSoftwareVersions, so temporary race condition caused count mismatch may not be a problem, or even expected. So I don't see much real damage here. Could you help review it? Thanks. TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails - Key: HDFS-7471 URL: https://issues.apache.org/jira/browse/HDFS-7471 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Ted Yu Assignee: Binglin Chang Attachments: HDFS-7471.001.patch From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ : {code} FAILED: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Error Message: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 Stack Trace: java.lang.AssertionError: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7538) removedDst should be checked against null in the finally block of FSDirRenameOp#unprotectedRenameTo()
[ https://issues.apache.org/jira/browse/HDFS-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335972#comment-14335972 ] Binglin Chang commented on HDFS-7538: - Hi [~tedyu], the patch is out of date, and I think the bug no longer exists, should this be resolved? removedDst should be checked against null in the finally block of FSDirRenameOp#unprotectedRenameTo() - Key: HDFS-7538 URL: https://issues.apache.org/jira/browse/HDFS-7538 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Minor Attachments: hdfs-7538-001.patch {code} if (removedDst != null) { undoRemoveDst = false; ... if (undoRemoveDst) { // Rename failed - restore dst if (dstParent.isDirectory() dstParent.asDirectory().isWithSnapshot()) { dstParent.asDirectory().undoRename4DstParent(removedDst, {code} If the first if check doesn't pass, removedDst would be null and undoRemoveDst may be true. This combination would lead to NullPointerException in the finally block. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306506#comment-14306506 ] Binglin Chang commented on HDFS-6261: - bq. I'd prefer to see this get merged into the RackAwareness documentation rather than building a completely new doc OK. Will update the patch once HADOOP-11495 is resolved. Add document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265585#comment-14265585 ] Binglin Chang commented on HDFS-6994: - If the current native_mini_dfs implementation is sufficient, we should just use it. The main concern is we may need to test HA, rpc retry and input/output stream fault tolerance, this requires expose(maybe also add) more methods to MiniDFSCluster, same effort is also required in MiniYarnCluster. From this point of view, providing a general way to call java method in c++ will avoid lot of redundant code. Agree that external process is annoying. To do test in one processes, seems the only option is jni(cause for tests requiring changes the states of mimidfscluster, we cannot just start a mimidfscluster and leave it) libhdfs3 - A native C/C++ HDFS client - Key: HDFS-6994 URL: https://issues.apache.org/jira/browse/HDFS-6994 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch Hi All I just got the permission to open source libhdfs3, which is a native C/C++ HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. libhdfs3 provide the libhdfs style C interface and a C++ interface. Support both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos authentication. libhdfs3 is currently used by HAWQ of Pivotal I'd like to integrate libhdfs3 into HDFS source code to benefit others. You can find libhdfs3 code from github https://github.com/PivotalRD/libhdfs3 http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261813#comment-14261813 ] Binglin Chang commented on HDFS-6994: - The trick is using reflection and json/java type auto mapping to create a generic method, so when I write in CLI: startDataNodes {conf} 3 true null [rack0, rack1] [1,1] or waitActive 1 or stopDatanode 1 It will find the proper MiniDFSCluster method, automatically do type conversion of arguments and call the method. By doing this, we can also start a minicluster and control its behavior manually, so it can also be used in manual debugging and testing. libhdfs3 - A native C/C++ HDFS client - Key: HDFS-6994 URL: https://issues.apache.org/jira/browse/HDFS-6994 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch Hi All I just got the permission to open source libhdfs3, which is a native C/C++ HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. libhdfs3 provide the libhdfs style C interface and a C++ interface. Support both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos authentication. libhdfs3 is currently used by HAWQ of Pivotal I'd like to integrate libhdfs3 into HDFS source code to benefit others. You can find libhdfs3 code from github https://github.com/PivotalRD/libhdfs3 http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261819#comment-14261819 ] Binglin Chang commented on HDFS-6994: - This is more like a cli(or repl) rather than rpc. On native side, we can wrap the repl to rpc interface, but it only requires to serialize c++ arguments to json strings(using sprintf should be enough) I see most commonly used methods' arguments and return value are just simple primitive types. Methods with complex types are not likely to be used. libhdfs3 - A native C/C++ HDFS client - Key: HDFS-6994 URL: https://issues.apache.org/jira/browse/HDFS-6994 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch Hi All I just got the permission to open source libhdfs3, which is a native C/C++ HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. libhdfs3 provide the libhdfs style C interface and a C++ interface. Support both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos authentication. libhdfs3 is currently used by HAWQ of Pivotal I'd like to integrate libhdfs3 into HDFS source code to benefit others. You can find libhdfs3 code from github https://github.com/PivotalRD/libhdfs3 http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260874#comment-14260874 ] Binglin Chang commented on HDFS-6994: - About adding more tests, we should add minidfscluster support, we can reuse native_mini_dfs.h in libhdfs, but it has some limitations: 1. it lacks some functionalities to do all the tests. e.g. start/stop datanode, corrupt file. 2. it add dependency of jni 3. add method support in native minidfscluster involve lot of work(get method id, type conversion etc.) I have another idea of doing this: 1. add some cli like interface to MiniDFSCluster in java side. support most commonly used MiniDFSCluster method as cli commands should be easy using reflection and json 2. On libhdfs3 side, tests can start MiniDFSCluster cli process and call those method in a cli+json protocol If you guys thinks its OK, I can create a task and work on this. libhdfs3 - A native C/C++ HDFS client - Key: HDFS-6994 URL: https://issues.apache.org/jira/browse/HDFS-6994 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch Hi All I just got the permission to open source libhdfs3, which is a native C/C++ HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. libhdfs3 provide the libhdfs style C interface and a C++ interface. Support both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos authentication. libhdfs3 is currently used by HAWQ of Pivotal I'd like to integrate libhdfs3 into HDFS source code to benefit others. You can find libhdfs3 code from github https://github.com/PivotalRD/libhdfs3 http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
[ https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7547: Resolution: Duplicate Status: Resolved (was: Patch Available) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup - Key: HDFS-7547 URL: https://issues.apache.org/jira/browse/HDFS-7547 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-7547.001.patch HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, so this test always fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7562) Fix Atoi.cc link error
Binglin Chang created HDFS-7562: --- Summary: Fix Atoi.cc link error Key: HDFS-7562 URL: https://issues.apache.org/jira/browse/HDFS-7562 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial When compiling, following error occurs: {noformat} Undefined symbols for architecture x86_64: hdfs::internal::StrToInt32(char const*, int*), referenced from: hdfs::Config::getInt32(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, int*) const in Config.cc.o hdfs::Config::getInt32(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, int, int*) const in Config.cc.o hdfs::internal::StrToInt64(char const*, long long*), referenced from: hdfs::Config::getInt64(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, long long*) const in Config.cc.o hdfs::Config::getInt64(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, long long, long long*) const in Config.cc.o hdfs::internal::StrToDouble(char const*, double*), referenced from: hdfs::Config::getDouble(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, double*) const in Config.cc.o hdfs::Config::getDouble(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, double, double*) const in Config.cc.o hdfs::internal::StrToBool(char const*, bool*), referenced from: hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, bool*) const in Config.cc.o hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, bool, bool*) const in Config.cc.o hdfs::internal::XmlData::handleData(void*, char const*, int) in XmlConfigParser.cc.o ld: symbol(s) not found for architecture x86_64 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7562) Fix Atoi.cc link error
[ https://issues.apache.org/jira/browse/HDFS-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7562: Status: Patch Available (was: Open) Fix Atoi.cc link error -- Key: HDFS-7562 URL: https://issues.apache.org/jira/browse/HDFS-7562 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial When compiling, following error occurs: {noformat} Undefined symbols for architecture x86_64: hdfs::internal::StrToInt32(char const*, int*), referenced from: hdfs::Config::getInt32(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, int*) const in Config.cc.o hdfs::Config::getInt32(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, int, int*) const in Config.cc.o hdfs::internal::StrToInt64(char const*, long long*), referenced from: hdfs::Config::getInt64(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, long long*) const in Config.cc.o hdfs::Config::getInt64(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, long long, long long*) const in Config.cc.o hdfs::internal::StrToDouble(char const*, double*), referenced from: hdfs::Config::getDouble(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, double*) const in Config.cc.o hdfs::Config::getDouble(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, double, double*) const in Config.cc.o hdfs::internal::StrToBool(char const*, bool*), referenced from: hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, bool*) const in Config.cc.o hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, bool, bool*) const in Config.cc.o hdfs::internal::XmlData::handleData(void*, char const*, int) in XmlConfigParser.cc.o ld: symbol(s) not found for architecture x86_64 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7562) Fix Atoi.cc link error
[ https://issues.apache.org/jira/browse/HDFS-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7562: Attachment: HDFS-7562-pnatve.001.patch Fix Atoi.cc link error -- Key: HDFS-7562 URL: https://issues.apache.org/jira/browse/HDFS-7562 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-7562-pnatve.001.patch When compiling, following error occurs: {noformat} Undefined symbols for architecture x86_64: hdfs::internal::StrToInt32(char const*, int*), referenced from: hdfs::Config::getInt32(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, int*) const in Config.cc.o hdfs::Config::getInt32(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, int, int*) const in Config.cc.o hdfs::internal::StrToInt64(char const*, long long*), referenced from: hdfs::Config::getInt64(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, long long*) const in Config.cc.o hdfs::Config::getInt64(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, long long, long long*) const in Config.cc.o hdfs::internal::StrToDouble(char const*, double*), referenced from: hdfs::Config::getDouble(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, double*) const in Config.cc.o hdfs::Config::getDouble(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, double, double*) const in Config.cc.o hdfs::internal::StrToBool(char const*, bool*), referenced from: hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, bool*) const in Config.cc.o hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, bool, bool*) const in Config.cc.o hdfs::internal::XmlData::handleData(void*, char const*, int) in XmlConfigParser.cc.o ld: symbol(s) not found for architecture x86_64 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7562) Fix Atoi.cc link error
[ https://issues.apache.org/jira/browse/HDFS-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7562: Resolution: Duplicate Assignee: (was: Binglin Chang) Status: Resolved (was: Patch Available) Fix Atoi.cc link error -- Key: HDFS-7562 URL: https://issues.apache.org/jira/browse/HDFS-7562 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Binglin Chang Priority: Trivial Attachments: HDFS-7562-pnatve.001.patch When compiling, following error occurs: {noformat} Undefined symbols for architecture x86_64: hdfs::internal::StrToInt32(char const*, int*), referenced from: hdfs::Config::getInt32(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, int*) const in Config.cc.o hdfs::Config::getInt32(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, int, int*) const in Config.cc.o hdfs::internal::StrToInt64(char const*, long long*), referenced from: hdfs::Config::getInt64(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, long long*) const in Config.cc.o hdfs::Config::getInt64(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, long long, long long*) const in Config.cc.o hdfs::internal::StrToDouble(char const*, double*), referenced from: hdfs::Config::getDouble(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, double*) const in Config.cc.o hdfs::Config::getDouble(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, double, double*) const in Config.cc.o hdfs::internal::StrToBool(char const*, bool*), referenced from: hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, bool*) const in Config.cc.o hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, bool, bool*) const in Config.cc.o hdfs::internal::XmlData::handleData(void*, char const*, int) in XmlConfigParser.cc.o ld: symbol(s) not found for architecture x86_64 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7562) Fix Atoi.cc link error
[ https://issues.apache.org/jira/browse/HDFS-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255651#comment-14255651 ] Binglin Chang commented on HDFS-7562: - HDFS-7018 already include this fix, close as duplicate Fix Atoi.cc link error -- Key: HDFS-7562 URL: https://issues.apache.org/jira/browse/HDFS-7562 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Binglin Chang Priority: Trivial Attachments: HDFS-7562-pnatve.001.patch When compiling, following error occurs: {noformat} Undefined symbols for architecture x86_64: hdfs::internal::StrToInt32(char const*, int*), referenced from: hdfs::Config::getInt32(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, int*) const in Config.cc.o hdfs::Config::getInt32(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, int, int*) const in Config.cc.o hdfs::internal::StrToInt64(char const*, long long*), referenced from: hdfs::Config::getInt64(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, long long*) const in Config.cc.o hdfs::Config::getInt64(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, long long, long long*) const in Config.cc.o hdfs::internal::StrToDouble(char const*, double*), referenced from: hdfs::Config::getDouble(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, double*) const in Config.cc.o hdfs::Config::getDouble(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, double, double*) const in Config.cc.o hdfs::internal::StrToBool(char const*, bool*), referenced from: hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, bool*) const in Config.cc.o hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, std::__1::allocatorchar const, bool, bool*) const in Config.cc.o hdfs::internal::XmlData::handleData(void*, char const*, int) in XmlConfigParser.cc.o ld: symbol(s) not found for architecture x86_64 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client
[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255683#comment-14255683 ] Binglin Chang commented on HDFS-6994: - Hi [~cmccabe] and [~wangzw] I get some time to work on the current libhdfs3 code, looks like all the code is under hdfs namespace(including code in common/network/rpc), those code is useful in native yarn client too(which is in the scope of HADOOP-10388), it is better to extract a common module, so hdfs and yarn can both depend on it, right? libhdfs3 - A native C/C++ HDFS client - Key: HDFS-6994 URL: https://issues.apache.org/jira/browse/HDFS-6994 Project: Hadoop HDFS Issue Type: New Feature Components: hdfs-client Reporter: Zhanwei Wang Assignee: Zhanwei Wang Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch Hi All I just got the permission to open source libhdfs3, which is a native C/C++ HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. libhdfs3 provide the libhdfs style C interface and a C++ interface. Support both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos authentication. libhdfs3 is currently used by HAWQ of Pivotal I'd like to integrate libhdfs3 into HDFS source code to benefit others. You can find libhdfs3 code from github https://github.com/PivotalRD/libhdfs3 http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
Binglin Chang created HDFS-7547: --- Summary: Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup Key: HDFS-7547 URL: https://issues.apache.org/jira/browse/HDFS-7547 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
[ https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7547: Description: HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup - Key: HDFS-7547 URL: https://issues.apache.org/jira/browse/HDFS-7547 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
[ https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7547: Description: HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, so this test always fails. (was: HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test ) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup - Key: HDFS-7547 URL: https://issues.apache.org/jira/browse/HDFS-7547 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, so this test always fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
[ https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7547: Status: Patch Available (was: Open) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup - Key: HDFS-7547 URL: https://issues.apache.org/jira/browse/HDFS-7547 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, so this test always fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
[ https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7547: Attachment: HDFS-7547.001.patch Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup - Key: HDFS-7547 URL: https://issues.apache.org/jira/browse/HDFS-7547 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-7547.001.patch HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's toString method to keep the old desc string format, test TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, so this test always fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251244#comment-14251244 ] Binglin Chang commented on HDFS-7527: - Make sense, looks like the behavior is changed at some point. Update the patch to partially support dfs.datanode.hostname(if it is an ip address, or the hostname resolve to a proper ip address). And add change to test to properly wait for the excluded datanode become back again(using Datanode.isDatanodeFullyStarted rather than checking ALIVE node count). Note that too fully restore the old behavior requires a lot more changes, currently I only made minimal changes. TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang Attachments: HDFS-7527.001.patch https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport
[jira] [Updated] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7527: Attachment: HDFS-7527.002.patch TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test:
[jira] [Commented] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247915#comment-14247915 ] Binglin Chang commented on HDFS-7471: - The failure is because the datanode is expired, see log: 2014-12-15 12:41:03,938 INFO blockmanagement.TestDatanodeManager (TestDatanodeManager.java:testNumVersionsReportedCorrect(121)) - Registering node storageID: someStorageID3896, version: version1, IP address: someIPsomeStorageID3896:9000 ... 2014-12-15 12:52:29,914 INFO blockmanagement.TestDatanodeManager (TestDatanodeManager.java:testNumVersionsReportedCorrect(121)) - Registering node storageID: someStorageID3896, version: version4, IP address: someIPsomeStorageID3896:9000 the default expire interval is 10:30s , the datanode someIPsomeStorageID3896 register at 2014-12-15 12:41:03 and never send heartbeats, after 11min(2014-12-15 12:52:29) when this datanode reregister, it won't call decrementVersionCount, so version count well not match {code} if(shouldCountVersion(nodeS)) { decrementVersionCount(nodeS.getSoftwareVersion()); } {code} Assuming the currently code logic is right(don't decrementVersionCount when datanode is expired), I think the simple fix should just increase expire interval. Or because the datanode is reregistering, maybe it should not marked as expired? TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails - Key: HDFS-7471 URL: https://issues.apache.org/jira/browse/HDFS-7471 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Ted Yu Assignee: Binglin Chang From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ : {code} FAILED: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Error Message: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 Stack Trace: java.lang.AssertionError: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7471: Target Version/s: 2.7.0 Status: Patch Available (was: Open) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails - Key: HDFS-7471 URL: https://issues.apache.org/jira/browse/HDFS-7471 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Ted Yu Assignee: Binglin Chang From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ : {code} FAILED: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Error Message: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 Stack Trace: java.lang.AssertionError: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7471: Attachment: HDFS-7471.001.patch Simple work around to increase expire interval. After investigate the code, I suspect the current countVersion logic may have race conditions, maybe someone more familiar with the code can provide a better fix. TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails - Key: HDFS-7471 URL: https://issues.apache.org/jira/browse/HDFS-7471 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Ted Yu Assignee: Binglin Chang Attachments: HDFS-7471.001.patch From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ : {code} FAILED: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Error Message: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 Stack Trace: java.lang.AssertionError: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang reassigned HDFS-7527: --- Assignee: Binglin Chang TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test:
[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248411#comment-14248411 ] Binglin Chang commented on HDFS-7527: - Read some related code, the test is intended to test dfs.host list can support dfs.datanode.hostname (e.g. if you set adatanode's name to host1, and dfs.host file contains host1, this datanode should be able to connect to namenode). But after reading to code, turns out DatanodeManager check dfs.host list only using ip address, not hostname(namenode resolve all hostnames in dfs.host file to ip address), so this test should fail as the expect behavior. The reason the test passes most of the time is because the code is missing proper waiting to make sure the old datanode is expired. {code} refreshNodes(cluster.getNamesystem(0), hdfsConf); cluster.restartDataNode(0); // there should be some wait time before the original datanode becoming dead, // or the following checking code will always success, because old datanode is still alive // Wait for the DN to come back. while (true) { DatanodeInfo info[] = client.datanodeReport(DatanodeReportType.LIVE); if (info.length == 1) { Assert.assertFalse(info[0].isDecommissioned()); Assert.assertFalse(info[0].isDecommissionInProgress()); assertEquals(registrationName, info[0].getHostName()); break; } LOG.info(Waiting for datanode to come back); Thread.sleep(HEARTBEAT_INTERVAL * 1000); } {code} I added some sleep time in the comment above, and the test always fail, which verify my theory. Since the test is not valid, I think we should just remove it. TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote}
[jira] [Updated] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7527: Target Version/s: 2.7.0 Status: Patch Available (was: Open) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test:
[jira] [Updated] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-7527: Attachment: HDFS-7527.001.patch TestDecommission.testIncludeByRegistrationName fails occassionally in trunk --- Key: HDFS-7527 URL: https://issues.apache.org/jira/browse/HDFS-7527 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang Assignee: Binglin Chang Attachments: HDFS-7527.001.patch https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message test timed out after 36 milliseconds Stacktrace java.lang.Exception: test timed out after 36 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957) 2014-12-15 12:00:19,958 ERROR datanode.DataNode (BPServiceActor.java:run(836)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(127.0.0.1, datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, infoSecurePort=0, ipcPort=43726, storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121) 2014-12-15 12:00:29,087 FATAL datanode.DataNode (BPServiceActor.java:run(841)) - Initialization failed for Block pool BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to localhost/127.0.0.1:40565. Exiting. java.io.IOException: DN shut down before block pool connected at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:745) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test:
[jira] [Assigned] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
[ https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang reassigned HDFS-7471: --- Assignee: Binglin Chang TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails - Key: HDFS-7471 URL: https://issues.apache.org/jira/browse/HDFS-7471 Project: Hadoop HDFS Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Ted Yu Assignee: Binglin Chang From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ : {code} FAILED: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Error Message: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 Stack Trace: java.lang.AssertionError: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 237 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7525) TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in trunk
[ https://issues.apache.org/jira/browse/HDFS-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang resolved HDFS-7525. - Resolution: Duplicate TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in trunk --- Key: HDFS-7525 URL: https://issues.apache.org/jira/browse/HDFS-7525 Project: Hadoop HDFS Issue Type: Bug Components: namenode, test Reporter: Yongjun Zhang https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/ {quote} Error Message The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 484 expected:0 but was:1 Stacktrace java.lang.AssertionError: The map of version counts returned by DatanodeManager was not what it was expected to be on iteration 484 expected:0 but was:1 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150) {quote} Found by tool proposed in HADOOP-11045: {quote} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j Hadoop-Hdfs-trunk -n 5 | tee bt.log Recently FAILED builds in url: https://builds.apache.org//job/Hadoop-Hdfs-trunk THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, as listed below: ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport (2014-12-15 03:30:01) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport (2014-12-13 10:32:27) Failed test: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport (2014-12-13 03:30:01) Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport (2014-12-11 03:30:01) Failed test: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline Failed test: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization Among 6 runs examined, all failed tests #failedRuns: testName: 3: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName 2: org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect 1: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6308) TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky
[ https://issues.apache.org/jira/browse/HDFS-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6308: Target Version/s: 2.7.0 TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky Key: HDFS-6308 URL: https://issues.apache.org/jira/browse/HDFS-6308 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6308.v1.patch Found this on pre-commit build of HDFS-6261 {code} java.lang.AssertionError: Expected one valid and one invalid volume at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-5574: Attachment: HDFS-5574.006.patch rebase patch to trunk Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.006.patch, HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-5574: Target Version/s: 2.7.0 Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.006.patch, HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4165) Faulty sanity check in FsDirectory.unprotectedSetQuota
[ https://issues.apache.org/jira/browse/HDFS-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148805#comment-14148805 ] Binglin Chang commented on HDFS-4165: - I think the change is simple and it is OK to merge to branch-2. Faulty sanity check in FsDirectory.unprotectedSetQuota -- Key: HDFS-4165 URL: https://issues.apache.org/jira/browse/HDFS-4165 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 3.0.0 Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Fix For: 3.0.0 Attachments: HDFS-4165.patch According to the documentation: The quota can have three types of values : (1) 0 or more will set the quota to that value, (2) {@link HdfsConstants#QUOTA_DONT_SET} implies the quota will not be changed, and (3) {@link HdfsConstants#QUOTA_RESET} implies the quota will be reset. Any other value is a runtime error. sanity check in FsDirectory.unprotectedSetQuota should use {code} nsQuota != HdfsConstants.QUOTA_RESET {code} rather than {code} nsQuota HdfsConstants.QUOTA_RESET {code} Since HdfsConstants.QUOTA_RESET is defined to be -1, there is not any problem for this code, but it is better to do it right. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6534) Fix build on macosx: HDFS parts
[ https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6534: Attachment: HDFS-6534.v3.patch Thanks for the notice Allen. Attach new version of the patch. Fix build on macosx: HDFS parts --- Key: HDFS-6534 URL: https://issues.apache.org/jira/browse/HDFS-6534 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-6534.v1.patch, HDFS-6534.v2.patch, HDFS-6534.v3.patch When compiling native code on macosx using clang, compiler find more warning and errors which gcc ignores, those should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127971#comment-14127971 ] Binglin Chang commented on HDFS-6506: - Thanks for the review Chris and Junping. Newly moved block replica been invalidated and deleted in TestBalancer -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Components: balancer, test Reporter: Binglin Chang Assignee: Binglin Chang Fix For: 2.6.0 Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch, HDFS-6506.v3.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic. I don't think
[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6506: Attachment: HDFS-6506.v3.patch Rebase patch to lastest trunk Newly moved block replica been invalidated and deleted in TestBalancer -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch, HDFS-6506.v3.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic. I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused this. -- This message was sent by Atlassian JIRA
[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip
[ https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061607#comment-14061607 ] Binglin Chang commented on HDFS-5574: - Hi [~cmccabe], looks like there are no more comments for a long time, could you help get this committed? Thanks:) Remove buffer copy in BlockReader.skip -- Key: HDFS-5574 URL: https://issues.apache.org/jira/browse/HDFS-5574 Project: Hadoop HDFS Issue Type: Improvement Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read data to this buffer, it is not necessary. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6506: Summary: Newly moved block replica been invalidated and deleted in TestBalancer (was: Newly moved block replica been invalidated and deleted) Newly moved block replica been invalidated and deleted in TestBalancer -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic. I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup
[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061612#comment-14061612 ] Binglin Chang commented on HDFS-6506: - Hi [~djp], this bug is related to TestBalancerWithNodeGroup, could you help review this? Thanks:) Newly moved block replica been invalidated and deleted in TestBalancer -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic. I don't think
[jira] [Commented] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040606#comment-14040606 ] Binglin Chang commented on HDFS-6586: - TestBalancerWithNodeGroup also failed before with the same reason(HDFS-6250), we fixed TestBalancerWithNodeGroup, but looks like TestBalancer have the same bug, and potentially also have bug HDFS-6250. TestBalancer#testExitZeroOnSuccess sometimes fails in trunk --- Key: HDFS-6586 URL: https://issues.apache.org/jira/browse/HDFS-6586 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/ : {code} Stacktrace java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040607#comment-14040607 ] Binglin Chang commented on HDFS-6586: - bq. and potentially also have bug HDFS-6250 sorry, it was HDFS-6506 TestBalancer#testExitZeroOnSuccess sometimes fails in trunk --- Key: HDFS-6586 URL: https://issues.apache.org/jira/browse/HDFS-6586 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/ : {code} Stacktrace java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6506: Attachment: HDFS-6506.v2.patch Update patch to add fix of bug in HDFS-6586, TestBalancer is affected by balancer.id file. Newly moved block replica been invalidated and deleted -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic. I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused this. -- This message was sent by
[jira] [Resolved] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang resolved HDFS-6586. - Resolution: Duplicate TestBalancer#testExitZeroOnSuccess sometimes fails in trunk --- Key: HDFS-6586 URL: https://issues.apache.org/jira/browse/HDFS-6586 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/ : {code} Stacktrace java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
[ https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040625#comment-14040625 ] Binglin Chang commented on HDFS-6586: - I updated the patch in HDSF-6506 to fix the bug, close this jira as duplicate. Thanks for reporting this, Ted. TestBalancer#testExitZeroOnSuccess sometimes fails in trunk --- Key: HDFS-6586 URL: https://issues.apache.org/jira/browse/HDFS-6586 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/ : {code} Stacktrace java.util.concurrent.TimeoutException: Rebalancing expected avg utilization to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more than 4 msec. at org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398) at org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4667) Capture renamed files/directories in snapshot diff report
[ https://issues.apache.org/jira/browse/HDFS-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039667#comment-14039667 ] Binglin Chang commented on HDFS-4667: - bq. Binglin Chang, do you have any other comments? No, thanks for the patch, lgtm Capture renamed files/directories in snapshot diff report - Key: HDFS-4667 URL: https://issues.apache.org/jira/browse/HDFS-4667 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-4667.002.patch, HDFS-4667.002.patch, HDFS-4667.003.patch, HDFS-4667.004.patch, HDFS-4667.demo.patch, HDFS-4667.v1.patch, getfullname-snapshot-support.patch Currently in the diff report we only show file/dir creation, deletion and modification. After rename with snapshots is supported, renamed file/dir should also be captured in the diff report. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6534) Fix build on macosx: HDFS parts
[ https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6534: Attachment: HDFS-6534.v2.patch FIx a minor typo causing linux build failed Fix build on macosx: HDFS parts --- Key: HDFS-6534 URL: https://issues.apache.org/jira/browse/HDFS-6534 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-6534.v1.patch, HDFS-6534.v2.patch When compiling native code on macosx using clang, compiler find more warning and errors which gcc ignores, those should be fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6534) Fix build on macosx: HDFS parts
[ https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6534: Status: Patch Available (was: Open) Fix build on macosx: HDFS parts --- Key: HDFS-6534 URL: https://issues.apache.org/jira/browse/HDFS-6534 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor When compiling native code on macosx using clang, compiler find more warning and errors which gcc ignores, those should be fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6534) Fix build on macosx: HDFS parts
[ https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6534: Attachment: HDFS-6534.v1.patch Changes: 1. fix bug in memset(hdfsFileInfo...) 2. use PRId64 instead of %ld to prevent compile warning 3. emulate clock_gettime/sem_init/sem_destroy on macosx 4. remove -lrt on macosx in CMakeLists.txt Fix build on macosx: HDFS parts --- Key: HDFS-6534 URL: https://issues.apache.org/jira/browse/HDFS-6534 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-6534.v1.patch When compiling native code on macosx using clang, compiler find more warning and errors which gcc ignores, those should be fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
[ https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032152#comment-14032152 ] Binglin Chang commented on HDFS-6539: - The failed test is not related, create HDFS-6541 to track this test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml -- Key: HDFS-6539 URL: https://issues.apache.org/jira/browse/HDFS-6539 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6539.v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6541) TestWebHdfsWithMultipleNameNodes.testRedirect failed with read timeout
Binglin Chang created HDFS-6541: --- Summary: TestWebHdfsWithMultipleNameNodes.testRedirect failed with read timeout Key: HDFS-6541 URL: https://issues.apache.org/jira/browse/HDFS-6541 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang https://builds.apache.org/job/PreCommit-HDFS-Build/7124/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsWithMultipleNameNodes/testRedirect/ Error Message Read timed out Stacktrace java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read1(BufferedInputStream.java:258) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:695) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:472) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:539) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:410) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:438) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:434) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.create(WebHdfsFileSystem.java:1049) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773) at org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.testRedirect(TestWebHdfsWithMultipleNameNodes.java:130) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-4667) Capture renamed files/directories in snapshot diff report
[ https://issues.apache.org/jira/browse/HDFS-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032272#comment-14032272 ] Binglin Chang commented on HDFS-4667: - Thanks for the updates [~jingzhao], I will have a look, it may take 1 or 2. Capture renamed files/directories in snapshot diff report - Key: HDFS-4667 URL: https://issues.apache.org/jira/browse/HDFS-4667 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Jing Zhao Assignee: Binglin Chang Attachments: HDFS-4667.002.patch, HDFS-4667.002.patch, HDFS-4667.003.patch, HDFS-4667.demo.patch, HDFS-4667.v1.patch, getfullname-snapshot-support.patch Currently in the diff report we only show file/dir creation, deletion and modification. After rename with snapshots is supported, renamed file/dir should also be captured in the diff report. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
Binglin Chang created HDFS-6539: --- Summary: test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml Key: HDFS-6539 URL: https://issues.apache.org/jira/browse/HDFS-6539 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
[ https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang reassigned HDFS-6539: --- Assignee: Binglin Chang test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml -- Key: HDFS-6539 URL: https://issues.apache.org/jira/browse/HDFS-6539 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
[ https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6539: Status: Patch Available (was: Open) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml -- Key: HDFS-6539 URL: https://issues.apache.org/jira/browse/HDFS-6539 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
[ https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6539: Attachment: HDFS-6539.v1.patch Hi [~cmccabe], looks like the patch in HADOOP-8480 has a little error, test_native_mini_dfs was changed to test_libhdfs_threaded, so test_libhdfs_threaded was tested twice, but test_native_mini_dfs is skipped. test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml -- Key: HDFS-6539 URL: https://issues.apache.org/jira/browse/HDFS-6539 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6539.v1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Moved] (HDFS-6534) Fix build on macosx: HDFS parts
[ https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang moved HADOOP-10700 to HDFS-6534: -- Key: HDFS-6534 (was: HADOOP-10700) Project: Hadoop HDFS (was: Hadoop Common) Fix build on macosx: HDFS parts --- Key: HDFS-6534 URL: https://issues.apache.org/jira/browse/HDFS-6534 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor When compiling native code on macosx using clang, compiler find more warning and errors which gcc ignores, those should be fixed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6506) Newly moved block replica been invalidated and deleted
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang reassigned HDFS-6506: --- Assignee: Binglin Chang Newly moved block replica been invalidated and deleted -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic. I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026188#comment-14026188 ] Binglin Chang commented on HDFS-6506: - Look at the log and code more throughly. The reason some block replica is invalidated is: 1. balancer round 1: move blk0 from dn0 to dn1, at this time block map haven't updated yet(so dn0 still have blk0) 2. balancer round 2 starts, and try to move blk0 from dn0 to dn2 3. dn2 copy data from dn0 4. dn0 heartbeat and get cmd to delete blk0 5. try to move blk0 from dn0 to dn2 , it canot find dn0, but it has to delete a replica, so it delete dn1 To prevent this, balancer need to wait some time to make sure the block movements in last round is fully committed, otherwise the movements in last round may be invalided. Newly moved block replica been invalidated and deleted -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007,
[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026215#comment-14026215 ] Binglin Chang commented on HDFS-6506: - Balancer already sleep 2*DFS_HEARTBEAT_INTERVAL seconds between rounds, but in TestBalancer.java: {code} conf.setLong(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 1L); {code} replica state update speed is related to DFS_NAMENODE_REPLICATION_INTERVAL too, which is 3 by default. TestBalancer only change heartbeat interval(which changes heartbeat interval and balancer iteration sleep time), but doesn't change ReplicationMonitor check interval, so the sleep time is too small to wait for movements getting committed. The other thing is 2*DFS_HEARTBEAT_INTERVAL still seems a little dangerous. maybe change it to 2*DFS_HEARTBEAT_INTERVAL + DFS_NAMENODE_REPLICATION_INTERVAL Newly moved block replica been invalidated and deleted -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003,
[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6506: Attachment: HDFS-6506.v1.patch Newly moved block replica been invalidated and deleted -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6506.v1.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic. I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6506: Status: Patch Available (was: Open) Newly moved block replica been invalidated and deleted -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6506.v1.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic. I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted
[ https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026512#comment-14026512 ] Binglin Chang commented on HDFS-6506: - The failed test is not related and is tracked in HDFS-3930, actually recent build also failed because of this. https://builds.apache.org/job/Hadoop-Hdfs-trunk/1770/consoleText Newly moved block replica been invalidated and deleted -- Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6506.v1.patch TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic. I don't think
[jira] [Created] (HDFS-6506) Newly moved block replica been invalidated and deleted
Binglin Chang created HDFS-6506: --- Summary: Newly moved block replica been invalidated and deleted Key: HDFS-6506 URL: https://issues.apache.org/jira/browse/HDFS-6506 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/ from the error log, the reason seems to be that newly moved block replicas been invalidated and deleted, so some work of the balancer are reversed. {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 fr 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} Normally this should not happen, when moving a block from src to dest, replica on src should be invalided not the dest, there should be bug inside related logic. I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused this. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success
[ https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026093#comment-14026093 ] Binglin Chang commented on HDFS-6159: - Thanks [~djp] and [~arpitagarwal] for the comments, I create HDFS-6506 to track this. TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success -- Key: HDFS-6159 URL: https://issues.apache.org/jira/browse/HDFS-6159 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.3.0 Reporter: Chen He Assignee: Chen He Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6159-v2.patch, HDFS-6159-v2.patch, HDFS-6159.patch, logs.txt The TestBalancerWithNodeGroup.testBalancerWithNodeGroup will report negative false failure if there is(are) data block(s) losing after balancer successfuly finishes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success
[ https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021553#comment-14021553 ] Binglin Chang commented on HDFS-6159: - The test error log: Rebalancing expected avg utilization to become 0.16, but on datanode 127.0.0.1:55468 it remains at 0.02 after more than 4 msec. bug from the balancer log: {noformat} 2014-06-06 18:15:51,681 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,683 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:51,682 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,702 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 2014-06-06 18:15:54,701 INFO balancer.Balancer (Balancer.java:dispatch(370)) - Successfully moved blk_1073741829_1005 with size=100 from 127.0.0.1:49159 to 127.0.0.1:55468 through 127.0.0.1:49159 We can see than there are 8 blocks(800 bytes) moved to 127.0.0.1:55468, the utilization of this datanode should be 0.16(800/5000). {noformat} But at the same time, those blocks are deleted by block manager: {noformat} 2014-06-06 18:15:54,706 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to invalidated blocks set 2014-06-06 18:15:54,709 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to invalidated blocks set 2014-06-06 18:15:56,421 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010] 2014-06-06 18:15:57,717 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to invalidated blocks set 2014-06-06 18:15:57,720 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to invalidated blocks set 2014-06-06 18:15:57,721 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to invalidated blocks set 2014-06-06 18:15:57,722 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to invalidated blocks set 2014-06-06 18:15:57,723 INFO BlockStateChange (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to invalidated blocks set 2014-06-06 18:15:59,422 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008] 2014-06-06 18:16:02,423 INFO BlockStateChange (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 127.0.0.1:55468 to delete [blk_1073741845_1021] {noformat} At last, only block blk_1073741828_1004 is left on 127.0.0.1:55468, so the final utilizition is(100/5000=0.02) Those blocks are newly moved by balancer and should not be invalidated by block manager. Most likely some logic in BlockManager.java invalidate blocks is broken? Look at the svn log, recently there are some change in BlockManager invalidate block. (HDFS-6424, HDFS-6362). Perhaps [~jingzhao] or [~arpitagarwal] can help look at this? TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success
[jira] [Created] (HDFS-6417) TestTransferFsImage.testClientSideException is flaky
Binglin Chang created HDFS-6417: --- Summary: TestTransferFsImage.testClientSideException is flaky Key: HDFS-6417 URL: https://issues.apache.org/jira/browse/HDFS-6417 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Priority: Minor Attachments: HDFS-6417.log Looks like http connection to NN timeout. https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/testReport/org.apache.hadoop.hdfs.server.namenode/TestTransferFsImage/testClientSideException/ Error Message Wanted but not invoked: nNStorage.reportErrorOnFile( /x-does-not-exist/blah ); - at org.apache.hadoop.hdfs.server.namenode.TestTransferFsImage.testClientSideException(TestTransferFsImage.java:80) Actually, there were zero interactions with this mock. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6417) TestTransferFsImage.testClientSideException is flaky
[ https://issues.apache.org/jira/browse/HDFS-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6417: Attachment: HDFS-6417.log TestTransferFsImage.testClientSideException is flaky Key: HDFS-6417 URL: https://issues.apache.org/jira/browse/HDFS-6417 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Priority: Minor Attachments: HDFS-6417.log Looks like http connection to NN timeout. https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/testReport/org.apache.hadoop.hdfs.server.namenode/TestTransferFsImage/testClientSideException/ Error Message Wanted but not invoked: nNStorage.reportErrorOnFile( /x-does-not-exist/blah ); - at org.apache.hadoop.hdfs.server.namenode.TestTransferFsImage.testClientSideException(TestTransferFsImage.java:80) Actually, there were zero interactions with this mock. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6381) Fix a typo in INodeReference.java
Binglin Chang created HDFS-6381: --- Summary: Fix a typo in INodeReference.java Key: HDFS-6381 URL: https://issues.apache.org/jira/browse/HDFS-6381 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeReference.java {code} * For example, - * (1) Support we have /abc/foo, say the inode of foo is inode(id=1000,name=foo) + * (1) Suppose we have /abc/foo, say the inode of foo is inode(id=1000,name=foo) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998419#comment-13998419 ] Binglin Chang commented on HDFS-6250: - The problem with adding an datanode on both rack0 and rack1 is that, you can't verify there is no block movement cross rack by checking datanode dfs usage. some block may move from rack0 to rack1, and some may move from rack1 to rack0, after this, the total rack usage may not change. TestBalancerWithNodeGroup.testBalancerWithRackLocality fails Key: HDFS-6250 URL: https://issues.apache.org/jira/browse/HDFS-6250 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, test_log.txt It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ {panel} java.lang.AssertionError: expected:1800 but was:1810 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998417#comment-13998417 ] Binglin Chang commented on HDFS-6250: - Hi [~airbots], what Junping points out is correct, the blocks in test file should never move to rack1 whether balancer.id is large or small, it is for reliability, if a replica in rack0 moved to rack1, all the replicas for a block will be in rack1, if rack1 is down, we get a missing block TestBalancerWithNodeGroup.testBalancerWithRackLocality fails Key: HDFS-6250 URL: https://issues.apache.org/jira/browse/HDFS-6250 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, test_log.txt It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ {panel} java.lang.AssertionError: expected:1800 but was:1810 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997229#comment-13997229 ] Binglin Chang commented on HDFS-6250: - Hi Junping, run TestBalancerWithNodeGroup.testBalancerWithRackLocality 50 times, no failure or timeout, average running time 5.2s, before was 20s run TestBalancerWithNodeGroup.testBalancerWithNodeGroup 50 times, no failure or timeout, average running time 10.2s, before was 17s TestBalancerWithNodeGroup.testBalancerWithRackLocality fails Key: HDFS-6250 URL: https://issues.apache.org/jira/browse/HDFS-6250 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, test_log.txt It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ {panel} java.lang.AssertionError: expected:1800 but was:1810 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6381) Fix a typo in INodeReference.java
[ https://issues.apache.org/jira/browse/HDFS-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6381: Status: Patch Available (was: Open) Fix a typo in INodeReference.java - Key: HDFS-6381 URL: https://issues.apache.org/jira/browse/HDFS-6381 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-6381.v1.patch hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeReference.java {code} * For example, - * (1) Support we have /abc/foo, say the inode of foo is inode(id=1000,name=foo) + * (1) Suppose we have /abc/foo, say the inode of foo is inode(id=1000,name=foo) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6381) Fix a typo in INodeReference.java
[ https://issues.apache.org/jira/browse/HDFS-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6381: Attachment: HDFS-6381.v1.patch Fix a typo in INodeReference.java - Key: HDFS-6381 URL: https://issues.apache.org/jira/browse/HDFS-6381 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Trivial Attachments: HDFS-6381.v1.patch hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeReference.java {code} * For example, - * (1) Support we have /abc/foo, say the inode of foo is inode(id=1000,name=foo) + * (1) Suppose we have /abc/foo, say the inode of foo is inode(id=1000,name=foo) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success
[ https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990433#comment-13990433 ] Binglin Chang commented on HDFS-6159: - The fix in the patch has some issue: bq. I propose to increase datanode capacity up to 6000B and data block size to 100B. {code} static final int DEFAULT_BLOCK_SIZE = 100; {code} this variable is not used anywhere, change it does not change block size, hence capacity is changed to 6000, block size remains 10 bytes actually leads more blocks needs to be moved, hence increase the total balancer running time, more likely to cause timeout. TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success -- Key: HDFS-6159 URL: https://issues.apache.org/jira/browse/HDFS-6159 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.3.0 Reporter: Chen He Assignee: Chen He Fix For: 3.0.0, 2.5.0 Attachments: HDFS-6159-v2.patch, HDFS-6159-v2.patch, HDFS-6159.patch, logs.txt The TestBalancerWithNodeGroup.testBalancerWithNodeGroup will report negative false failure if there is(are) data block(s) losing after balancer successfuly finishes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990440#comment-13990440 ] Binglin Chang commented on HDFS-6250: - Hi [~airbots], please see my comments about HDFS-6159 https://issues.apache.org/jira/browse/HDFS-6159?focusedCommentId=13990433page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13990433 The fix in HDFS-6159 and this jira seems to be suboptimal, we may need to reconsider the approach. TestBalancerWithNodeGroup.testBalancerWithRackLocality fails Key: HDFS-6250 URL: https://issues.apache.org/jira/browse/HDFS-6250 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6250-v2.patch, HDFS-6250.patch, test_log.txt It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ {panel} java.lang.AssertionError: expected:1800 but was:1810 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990491#comment-13990491 ] Binglin Chang commented on HDFS-6250: - I maded a patch to address this jira and HDFS-6159, along with a minor fix in balancer.id ralated doc, changes: 1. make CAPACITY to 5000 rather than 6000, so it remains same ratio to block size as before, make DEFAULT_BLOCK_SIZE useful 2. change validate method in testBalancerWithRackLocality so it doesn't depends on balancer.id file 3. there is a doc error about balancer.id, fixed With the change the test now runs only about 7 seconds, rather than 20+ seconds. TestBalancerWithNodeGroup.testBalancerWithRackLocality fails Key: HDFS-6250 URL: https://issues.apache.org/jira/browse/HDFS-6250 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, test_log.txt It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ {panel} java.lang.AssertionError: expected:1800 but was:1810 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6250: Attachment: HDFS-6250-v3.patch TestBalancerWithNodeGroup.testBalancerWithRackLocality fails Key: HDFS-6250 URL: https://issues.apache.org/jira/browse/HDFS-6250 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Chen He Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, test_log.txt It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/ {panel} java.lang.AssertionError: expected:1800 but was:1810 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253) {panel} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge
[ https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990668#comment-13990668 ] Binglin Chang commented on HDFS-6342: - Hi [~airbots], I agree with not changing balancer.id file, my new patch doesn't change it, pls see my new comments in HDFS-6250. TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge --- Key: HDFS-6342 URL: https://issues.apache.org/jira/browse/HDFS-6342 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen He Assignee: Chen He Attachments: HDFS-6342.patch The testBalancerWithRackLocality mehtod is to test balancer moving data blocks with rack locality consideration. It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It create 180 data blocks with replication factor 2. Then, a node datanode is created (in rack1nodeGroup2) and balancer starts to balancing the cluster. It expects there is only data blocks moving within rack1. After balancer is done, it assumes the data size on both racks is the same. It will break if balancer.id file is huge and there is inter-rack data block moving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
[ https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989495#comment-13989495 ] Binglin Chang commented on HDFS-6250: - Thanks for the analysis and patch [~airbots]. The fix makes sense, here are some additional concerns: bq. HDFS creates a /system/balancer.id file (30B) to track the balancer Looks like the file contains hostname, whose size is not fixed, I see you increased block size and capacity to minimize the impact of the file, but it seems the risk is still there. testBalancerWithRackLocality tests balancer do not perform cross rack block movements in test scenario, here are the related balancer logs: {code} 014-04-15 18:29:48,649 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:29:48,650 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:54333, utilization=30.0], Source[127.0.0.1:46174, utilization=30.0]] 2014-04-15 18:29:48,650 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:29:48,650 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=0.0]] 2014-04-15 18:29:51,722 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:29:51,722 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:54333, utilization=30.168], Source[127.0.0.1:46174, utilization=30.332]] 2014-04-15 18:29:51,722 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:29:51,722 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=1.8333]] 2014-04-15 18:29:54,820 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:29:54,820 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:54333, utilization=28.5], Source[127.0.0.1:46174, utilization=30.332]] 2014-04-15 18:29:54,820 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:29:54,820 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=5.0]] 2014-04-15 18:29:57,898 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:29:57,898 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:46174, utilization=30.332], Source[127.0.0.1:54333, utilization=25.332]] 2014-04-15 18:29:57,899 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:29:57,899 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=7.667]] 2014-04-15 18:30:00,933 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:30:00,933 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 above-average: [Source[127.0.0.1:54333, utilization=22.668], Source[127.0.0.1:46174, utilization=30.332]] 2014-04-15 18:30:00,933 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 below-average: [] 2014-04-15 18:30:00,933 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=10.5]] 2014-04-15 18:30:03,989 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 over-utilized: [] 2014-04-15 18:30:03,989 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 1 above-average: [Source[127.0.0.1:46174, utilization=30.332]] 2014-04-15 18:30:03,989 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 2 below-average: [BalancerDatanode[127.0.0.1:54333, utilization=19.832], BalancerDatanode[127.0.0.1:48293, utilization=12.0]] 2014-04-15 18:30:03,989 INFO balancer.Balancer (Balancer.java:logNodes(960)) - 0 underutilized: [] {code} I guess the test intended to let /rack0/NODEGROUP0/dn above-average(=30%) but not over-utilized(30%, consider avg utilization=20%), so blocks on rack0 never move to rack1, but another balancer.id file may break the assumption. So there are some problem inherently in the test, not just race condition or timeout stuff. We may need to change the test(e.g. file size, utilize rate, validate method) to prevent those corner cases. TestBalancerWithNodeGroup.testBalancerWithRackLocality fails Key: HDFS-6250 URL: https://issues.apache.org/jira/browse/HDFS-6250 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Assignee: Chen He Attachments:
[jira] [Commented] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge
[ https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990208#comment-13990208 ] Binglin Chang commented on HDFS-6342: - Test rack capacities are equal doesn't mean there is no block movement cross rack, I don't think simply add a new datanode works, right? Maybe we can make more changes and in the mean time reduce the timeout if possible, 80 seconds for a test is a bit long. TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge --- Key: HDFS-6342 URL: https://issues.apache.org/jira/browse/HDFS-6342 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen He Assignee: Chen He Attachments: HDFS-6342.patch The testBalancerWithRackLocality mehtod is to test balancer moving data blocks with rack locality consideration. It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It create 180 data blocks with replication factor 2. Then, a node datanode is created (in rack1nodeGroup2) and balancer starts to balancing the cluster. It expects there is only data blocks moving within rack1. After balancer is done, it assumes the data size on both racks is the same. It will break if balancer.id file is huge and there is inter-rack data block moving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge
[ https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990211#comment-13990211 ] Binglin Chang commented on HDFS-6342: - As for the fix, I see the need to write a balancer id file, but fill it with hostname doesn't seem to be necessary(cause it is never used anywhere), so if we can modify balancer, write the balancer file but don't write any content, it should not have side effects to balancer and test check code, and we may skip timeout(need to confirm) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge --- Key: HDFS-6342 URL: https://issues.apache.org/jira/browse/HDFS-6342 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Chen He Assignee: Chen He Attachments: HDFS-6342.patch The testBalancerWithRackLocality mehtod is to test balancer moving data blocks with rack locality consideration. It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It create 180 data blocks with replication factor 2. Then, a node datanode is created (in rack1nodeGroup2) and balancer starts to balancing the cluster. It expects there is only data blocks moving within rack1. After balancer is done, it assumes the data size on both racks is the same. It will break if balancer.id file is huge and there is inter-rack data block moving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6308) TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky
Binglin Chang created HDFS-6308: --- Summary: TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky Key: HDFS-6308 URL: https://issues.apache.org/jira/browse/HDFS-6308 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Found this on pre-commit build of HDFS-6261 {code} java.lang.AssertionError: Expected one valid and one invalid volume at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985214#comment-13985214 ] Binglin Chang commented on HDFS-6261: - Sorry, looked more carefully, the failure is different, fired HDFS-6308 for this. Add document for enabling node group layer in HDFS -- Key: HDFS-6261 URL: https://issues.apache.org/jira/browse/HDFS-6261 Project: Hadoop HDFS Issue Type: Task Components: documentation Reporter: Wenwu Peng Assignee: Binglin Chang Labels: documentation Attachments: 2-layer-topology.png, 3-layer-topology.png, 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch Most of patches from Umbrella JIRA HADOOP-8468 have committed, However there is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) and how to do configuration. so we need to doc it. 1. Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 2. Doc NodeGroup-aware properties in core-default.xml. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6308) TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky
[ https://issues.apache.org/jira/browse/HDFS-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985319#comment-13985319 ] Binglin Chang commented on HDFS-6308: - Related error log: {code} 2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(197)) - 1418: Call - /127.0.0.1:58789: getHdfsBlockLocations {tokens { identifier: password: kind: service: } tokens { identifier: password: kind: service: } blockPoolId: BP-1664789652-67.195.138.24-1398662297553 blockIds: 1073741825 blockIds: 1073741826} 2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(197)) - 1419: Call - /127.0.0.1:45933: getHdfsBlockLocations {tokens { identifier: password: kind: service: } tokens { identifier: password: kind: service: } blockPoolId: BP-1664789652-67.195.138.24-1398662297553 blockIds: 1073741825 blockIds: 1073741826} 2014-04-28 05:18:19,701 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1418: Exception - localhost/127.0.0.1:58789: getHdfsBlockLocations {java.net.ConnectException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:58789 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused} 2014-04-28 05:18:19,701 INFO ipc.Server (Server.java:doRead(762)) - Socket Reader #1 for port 45933: readAndProcess from client 127.0.0.1 threw exception [java.io.IOException: Connection reset by peer] java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:171) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.ipc.Server.channelRead(Server.java:2644) at org.apache.hadoop.ipc.Server.access$2800(Server.java:133) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1517) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:753) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:627) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:598) 2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1419: Exception - /127.0.0.1:45933: getHdfsBlockLocations {java.net.SocketTimeoutException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 remote=/127.0.0.1:45933]; For more details see: http://wiki.apache.org/hadoop/SocketTimeout} 2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(211)) - 1415: Exception - localhost/127.0.0.1:45933: getHdfsBlockLocations {java.net.SocketTimeoutException: Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 remote=/127.0.0.1:45933]; For more details see: {code} socket read/write timeout is set to 1500ms, timeout error is global(per connection), so when timeout occurs, all calls in this connection are marked timeout, but the expected behavior should be: first call timeout, second call normal. There is a simple fix, just invoke second call after the connection is closed for sure. We can consider improving ipc.Client to prevent this kind of corner case later. TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky Key: HDFS-6308 URL: https://issues.apache.org/jira/browse/HDFS-6308 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Found this on pre-commit build of HDFS-6261 {code} java.lang.AssertionError: Expected one valid and one invalid volume at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6308) TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky
[ https://issues.apache.org/jira/browse/HDFS-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6308: Assignee: Binglin Chang Status: Patch Available (was: Open) TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky Key: HDFS-6308 URL: https://issues.apache.org/jira/browse/HDFS-6308 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Found this on pre-commit build of HDFS-6261 {code} java.lang.AssertionError: Expected one valid and one invalid volume at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6308) TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky
[ https://issues.apache.org/jira/browse/HDFS-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Binglin Chang updated HDFS-6308: Attachment: HDFS-6308.v1.patch TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky Key: HDFS-6308 URL: https://issues.apache.org/jira/browse/HDFS-6308 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-6308.v1.patch Found this on pre-commit build of HDFS-6261 {code} java.lang.AssertionError: Expected one valid and one invalid volume at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)