from:"Binglin Chang \(JIRA\)"

[jira] [Updated] (HDFS-6261) Document for enabling node group layer in HDFS

2015-06-07 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6261:

Attachment: HDFS-6261.010.patch

Thanks for the review Junping!
I change nodes to node groups, that should have not conflict with No 
duplicated replicas are on the same node or node group, and still simple 
enough to avoid misunderstanding?
{noformat}
The remaining replicas are placed randomly across other node groups
{noformat}


 Document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, 
 HDFS-6261.005.patch, HDFS-6261.006.patch, HDFS-6261.007.patch, 
 HDFS-6261.008.patch, HDFS-6261.009.patch, HDFS-6261.010.patch, 
 HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6261) Document for enabling node group layer in HDFS

2015-06-01 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6261:

Attachment: HDFS-6261.008.patch

Thanks for the detailed review, nice comments. I made some modifications 
according to your comments. 

 Document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, 
 HDFS-6261.005.patch, HDFS-6261.006.patch, HDFS-6261.007.patch, 
 HDFS-6261.008.patch, HDFS-6261.v1.patch, HDFS-6261.v1.patch, 
 HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6261) Document for enabling node group layer in HDFS

2015-06-01 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6261:

Attachment: HDFS-6261.009.patch

remove tailing whitespace

 Document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, 
 HDFS-6261.005.patch, HDFS-6261.006.patch, HDFS-6261.007.patch, 
 HDFS-6261.008.patch, HDFS-6261.009.patch, HDFS-6261.v1.patch, 
 HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6261) Add document for enabling node group layer in HDFS

2015-05-24 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6261:

Attachment: HDFS-6261.007.patch

Thanks Allen. Update the patch.


 Add document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, 
 HDFS-6261.005.patch, HDFS-6261.006.patch, HDFS-6261.007.patch, 
 HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6261) Add document for enabling node group layer in HDFS

2015-05-24 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6261:

Attachment: HDFS-6261.006.patch

 Add document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, 
 HDFS-6261.005.patch, HDFS-6261.006.patch, HDFS-6261.v1.patch, 
 HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6261) Add document for enabling node group layer in HDFS

2015-05-24 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6261:

Status: Patch Available  (was: Open)

 Add document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, 
 HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6261) Add document for enabling node group layer in HDFS

2015-05-24 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6261:

Attachment: HDFS-6261.004.patch

 Add document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, 
 HDFS-6261.v1.patch, HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6261) Add document for enabling node group layer in HDFS

2015-05-24 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6261:

Attachment: HDFS-6261.005.patch

Remove binary file in patch, 
the 2 png images(2-layer-topology.png, 3-layer-topology.png) should put into 
hadoop-common-project/hadoop-common/src/site/resources/images/


 Add document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.004.patch, 
 HDFS-6261.005.patch, HDFS-6261.v1.patch, HDFS-6261.v1.patch, 
 HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS

2015-05-21 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555460#comment-14555460
 ] 

Binglin Chang commented on HDFS-6261:
-

Sorry... will see if I can get this done this weekend. 


 Add document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, 
 HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip

2015-04-28 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516940#comment-14516940
 ] 

Binglin Chang commented on HDFS-5574:
-

Strange, the test error is caused by NoSuchMethodError, which should not happen 
if code is compiled successfully, is there any bug in test-patch process?

{code}
java.lang.NoSuchMethodError: 
org.apache.hadoop.fs.FSInputChecker.readAndDiscard(I)I
at 
org.apache.hadoop.hdfs.RemoteBlockReader.read(RemoteBlockReader.java:128)
at 
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:740)
at 
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:796)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:856)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:899)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:700)
at 
org.apache.hadoop.hdfs.TestDFSInputStream.testSkipInner(TestDFSInputStream.java:61)
at 
org.apache.hadoop.hdfs.TestDFSInputStream.testSkipWithRemoteBlockReader(TestDFSInputStream.java:76)
{code}

 Remove buffer copy in BlockReader.skip
 --

 Key: HDFS-5574
 URL: https://issues.apache.org/jira/browse/HDFS-5574
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HDFS-5574.006.patch, HDFS-5574.007.patch, 
 HDFS-5574.008.patch, HDFS-5574.v1.patch, HDFS-5574.v2.patch, 
 HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch


 BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read 
 data to this buffer, it is not necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-5574) Remove buffer copy in BlockReader.skip

2015-04-28 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-5574:

Attachment: HDFS-5574.008.patch

Oops, sorry I forgot this, attach new patch.

 Remove buffer copy in BlockReader.skip
 --

 Key: HDFS-5574
 URL: https://issues.apache.org/jira/browse/HDFS-5574
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HDFS-5574.006.patch, HDFS-5574.007.patch, 
 HDFS-5574.008.patch, HDFS-5574.v1.patch, HDFS-5574.v2.patch, 
 HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch


 BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read 
 data to this buffer, it is not necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-5574) Remove buffer copy in BlockReader.skip

2015-04-27 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-5574:

Attachment: HDFS-5574.007.patch

Thanks for the review Akira. Update the patch to fix compile and stylecheck 
warnings 

 Remove buffer copy in BlockReader.skip
 --

 Key: HDFS-5574
 URL: https://issues.apache.org/jira/browse/HDFS-5574
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HDFS-5574.006.patch, HDFS-5574.007.patch, 
 HDFS-5574.v1.patch, HDFS-5574.v2.patch, HDFS-5574.v3.patch, 
 HDFS-5574.v4.patch, HDFS-5574.v5.patch


 BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read 
 data to this buffer, it is not necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS

2015-03-23 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377332#comment-14377332
 ] 

Binglin Chang commented on HDFS-6261:
-

Sorry for the late, will update the patch soon. 

 Add document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, 
 HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails

2015-03-17 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364770#comment-14364770
 ] 

Binglin Chang commented on HDFS-7471:
-

Hi [~szetszwo], I think the main concern of the patch is it may hide race 
condition problem, I see in the code the state is periodically refreshed by 
countSoftwareVersions, so temporary race condition caused count mismatch may 
not be a problem, or even expected. 
So I don't see much real damage here. Could you help review it? Thanks.



  

 TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
 -

 Key: HDFS-7471
 URL: https://issues.apache.org/jira/browse/HDFS-7471
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: HDFS-7471.001.patch


 From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ :
 {code}
 FAILED:  
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Error Message:
 The map of version counts returned by DatanodeManager was not what it was 
 expected to be on iteration 237 expected:0 but was:1
 Stack Trace:
 java.lang.AssertionError: The map of version counts returned by 
 DatanodeManager was not what it was expected to be on iteration 237 
 expected:0 but was:1
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7538) removedDst should be checked against null in the finally block of FSDirRenameOp#unprotectedRenameTo()

2015-02-24 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335972#comment-14335972
 ] 

Binglin Chang commented on HDFS-7538:
-

Hi [~tedyu], the patch is out of date, and I think the bug no longer exists, 
should this be resolved?

 removedDst should be checked against null in the finally block of 
 FSDirRenameOp#unprotectedRenameTo()
 -

 Key: HDFS-7538
 URL: https://issues.apache.org/jira/browse/HDFS-7538
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-7538-001.patch


 {code}
 if (removedDst != null) {
   undoRemoveDst = false;
 ...
   if (undoRemoveDst) {
 // Rename failed - restore dst
 if (dstParent.isDirectory() 
 dstParent.asDirectory().isWithSnapshot()) {
   dstParent.asDirectory().undoRename4DstParent(removedDst,
 {code}
 If the first if check doesn't pass, removedDst would be null and 
 undoRemoveDst may be true.
 This combination would lead to NullPointerException in the finally block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS

2015-02-04 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306506#comment-14306506
 ] 

Binglin Chang commented on HDFS-6261:
-

bq. I'd prefer to see this get merged into the RackAwareness documentation 
rather than building a completely new doc
OK. Will update the patch once HADOOP-11495 is resolved. 

 Add document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, 
 HDFS-6261.v1.patch, HDFS-6261.v2.patch, HDFS-6261.v3.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2015-01-05 Thread Binglin Chang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14265585#comment-14265585
]

Binglin Chang commented on HDFS-6994:
-

If the current native_mini_dfs implementation is sufficient, we should just use
it. The main concern is we may need to test HA, rpc retry and input/output
stream fault tolerance, this requires expose(maybe also add) more methods to
MiniDFSCluster, same effort is also required in MiniYarnCluster. From this
point of view, providing a general way to call java method in c++ will avoid
lot of redundant code.
Agree that external process is annoying. To do test in one processes, seems the
only option is jni(cause for tests requiring changes the states of
mimidfscluster, we cannot just start a mimidfscluster and leave it)

libhdfs3 - A native C/C++ HDFS client
-

Key: HDFS-6994
URL: https://issues.apache.org/jira/browse/HDFS-6994
Project: Hadoop HDFS
Issue Type: New Feature
Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch

Hi All
I just got the permission to open source libhdfs3, which is a native C/C++
HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
libhdfs3 provide the libhdfs style C interface and a C++ interface. Support
both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos
authentication.
libhdfs3 is currently used by HAWQ of Pivotal
I'd like to integrate libhdfs3 into HDFS source code to benefit others.
You can find libhdfs3 code from github
https://github.com/PivotalRD/libhdfs3
http://pivotalrd.github.io/libhdfs3/

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-12-30 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261813#comment-14261813
 ] 

Binglin Chang commented on HDFS-6994:
-

The trick is using reflection and json/java type auto mapping to create a 
generic method, so when I write in CLI:
startDataNodes {conf} 3 true null [rack0, rack1] [1,1] 
or
waitActive 1
or
stopDatanode 1
It will find the proper MiniDFSCluster method, automatically do type conversion 
of arguments and call the method.
By doing this, we can also start a minicluster and control its behavior 
manually, so it can also be used in manual debugging and testing.


 libhdfs3 - A native C/C++ HDFS client
 -

 Key: HDFS-6994
 URL: https://issues.apache.org/jira/browse/HDFS-6994
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch


 Hi All
 I just got the permission to open source libhdfs3, which is a native C/C++ 
 HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
 libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
 both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
 authentication.
 libhdfs3 is currently used by HAWQ of Pivotal
 I'd like to integrate libhdfs3 into HDFS source code to benefit others.
 You can find libhdfs3 code from github
 https://github.com/PivotalRD/libhdfs3
 http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-12-30 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261819#comment-14261819
 ] 

Binglin Chang commented on HDFS-6994:
-

This is more like a cli(or repl) rather than rpc. On native side, we can wrap 
the repl to rpc interface, but it only requires to serialize c++ arguments to 
json strings(using sprintf should be enough) I see most commonly used methods' 
arguments and return value are just simple primitive types. Methods with 
complex types are not likely to be used.


 libhdfs3 - A native C/C++ HDFS client
 -

 Key: HDFS-6994
 URL: https://issues.apache.org/jira/browse/HDFS-6994
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch


 Hi All
 I just got the permission to open source libhdfs3, which is a native C/C++ 
 HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
 libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
 both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
 authentication.
 libhdfs3 is currently used by HAWQ of Pivotal
 I'd like to integrate libhdfs3 into HDFS source code to benefit others.
 You can find libhdfs3 code from github
 https://github.com/PivotalRD/libhdfs3
 http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-12-29 Thread Binglin Chang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260874#comment-14260874
]

Binglin Chang commented on HDFS-6994:
-

About adding more tests, we should add minidfscluster support, we can reuse
native_mini_dfs.h in libhdfs, but it has some limitations:
1. it lacks some functionalities to do all the tests. e.g. start/stop datanode,
corrupt file.
2. it add dependency of jni
3. add method support in native minidfscluster involve lot of work(get method
id, type conversion etc.)
I have another idea of doing this:
1. add some cli like interface to MiniDFSCluster in java side. support most
commonly used MiniDFSCluster method as cli commands should be easy using
reflection and json
2. On libhdfs3 side, tests can start MiniDFSCluster cli process and call those
method in a cli+json protocol
If you guys thinks its OK, I can create a task and work on this.

libhdfs3 - A native C/C++ HDFS client
-

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-24 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7547:

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

 Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
 -

 Key: HDFS-7547
 URL: https://issues.apache.org/jira/browse/HDFS-7547
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-7547.001.patch


 HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's 
 toString method to keep the old desc string format, test 
 TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, 
 so this test always fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7562) Fix Atoi.cc link error

2014-12-22 Thread Binglin Chang (JIRA)

Binglin Chang created HDFS-7562:
---

 Summary: Fix Atoi.cc link error
 Key: HDFS-7562
 URL: https://issues.apache.org/jira/browse/HDFS-7562
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial


When compiling, following error occurs:
{noformat}
Undefined symbols for architecture x86_64:
hdfs::internal::StrToInt32(char const*, int*), referenced from:
hdfs::Config::getInt32(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, int*) const in 
Config.cc.o
hdfs::Config::getInt32(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, int, int*) 
const in Config.cc.o
hdfs::internal::StrToInt64(char const*, long long*), referenced from:
hdfs::Config::getInt64(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, long long*) 
const in Config.cc.o
hdfs::Config::getInt64(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, long long, 
long long*) const in Config.cc.o
hdfs::internal::StrToDouble(char const*, double*), referenced from:
hdfs::Config::getDouble(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, double*) const 
in Config.cc.o
hdfs::Config::getDouble(std::__1::basic_stringchar, 
std::__1::char_traitschar, std::__1::allocatorchar  const, double, 
double*) const in Config.cc.o
hdfs::internal::StrToBool(char const*, bool*), referenced from:
hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, 
std::__1::allocatorchar  const, bool*) const in Config.cc.o
hdfs::Config::getBool(std::__1::basic_stringchar, std::__1::char_traitschar, 
std::__1::allocatorchar  const, bool, bool*) const in Config.cc.o
hdfs::internal::XmlData::handleData(void*, char const*, int) in 
XmlConfigParser.cc.o
ld: symbol(s) not found for architecture x86_64
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7562) Fix Atoi.cc link error

2014-12-22 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7562:

Status: Patch Available  (was: Open)

 Fix Atoi.cc link error
 --

 Key: HDFS-7562
 URL: https://issues.apache.org/jira/browse/HDFS-7562
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial

 When compiling, following error occurs:
 {noformat}
 Undefined symbols for architecture x86_64:
 hdfs::internal::StrToInt32(char const*, int*), referenced from:
 hdfs::Config::getInt32(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, int*) const 
 in Config.cc.o
 hdfs::Config::getInt32(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, int, int*) 
 const in Config.cc.o
 hdfs::internal::StrToInt64(char const*, long long*), referenced from:
 hdfs::Config::getInt64(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, long long*) 
 const in Config.cc.o
 hdfs::Config::getInt64(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, long long, 
 long long*) const in Config.cc.o
 hdfs::internal::StrToDouble(char const*, double*), referenced from:
 hdfs::Config::getDouble(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, double*) 
 const in Config.cc.o
 hdfs::Config::getDouble(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, double, 
 double*) const in Config.cc.o
 hdfs::internal::StrToBool(char const*, bool*), referenced from:
 hdfs::Config::getBool(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, bool*) const 
 in Config.cc.o
 hdfs::Config::getBool(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, bool, bool*) 
 const in Config.cc.o
 hdfs::internal::XmlData::handleData(void*, char const*, int) in 
 XmlConfigParser.cc.o
 ld: symbol(s) not found for architecture x86_64
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7562) Fix Atoi.cc link error

2014-12-22 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7562:

Attachment: HDFS-7562-pnatve.001.patch

 Fix Atoi.cc link error
 --

 Key: HDFS-7562
 URL: https://issues.apache.org/jira/browse/HDFS-7562
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HDFS-7562-pnatve.001.patch


 When compiling, following error occurs:
 {noformat}
 Undefined symbols for architecture x86_64:
 hdfs::internal::StrToInt32(char const*, int*), referenced from:
 hdfs::Config::getInt32(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, int*) const 
 in Config.cc.o
 hdfs::Config::getInt32(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, int, int*) 
 const in Config.cc.o
 hdfs::internal::StrToInt64(char const*, long long*), referenced from:
 hdfs::Config::getInt64(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, long long*) 
 const in Config.cc.o
 hdfs::Config::getInt64(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, long long, 
 long long*) const in Config.cc.o
 hdfs::internal::StrToDouble(char const*, double*), referenced from:
 hdfs::Config::getDouble(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, double*) 
 const in Config.cc.o
 hdfs::Config::getDouble(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, double, 
 double*) const in Config.cc.o
 hdfs::internal::StrToBool(char const*, bool*), referenced from:
 hdfs::Config::getBool(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, bool*) const 
 in Config.cc.o
 hdfs::Config::getBool(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, bool, bool*) 
 const in Config.cc.o
 hdfs::internal::XmlData::handleData(void*, char const*, int) in 
 XmlConfigParser.cc.o
 ld: symbol(s) not found for architecture x86_64
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7562) Fix Atoi.cc link error

2014-12-22 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7562:

Resolution: Duplicate
  Assignee: (was: Binglin Chang)
Status: Resolved  (was: Patch Available)

 Fix Atoi.cc link error
 --

 Key: HDFS-7562
 URL: https://issues.apache.org/jira/browse/HDFS-7562
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Binglin Chang
Priority: Trivial
 Attachments: HDFS-7562-pnatve.001.patch


 When compiling, following error occurs:
 {noformat}
 Undefined symbols for architecture x86_64:
 hdfs::internal::StrToInt32(char const*, int*), referenced from:
 hdfs::Config::getInt32(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, int*) const 
 in Config.cc.o
 hdfs::Config::getInt32(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, int, int*) 
 const in Config.cc.o
 hdfs::internal::StrToInt64(char const*, long long*), referenced from:
 hdfs::Config::getInt64(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, long long*) 
 const in Config.cc.o
 hdfs::Config::getInt64(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, long long, 
 long long*) const in Config.cc.o
 hdfs::internal::StrToDouble(char const*, double*), referenced from:
 hdfs::Config::getDouble(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, double*) 
 const in Config.cc.o
 hdfs::Config::getDouble(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, double, 
 double*) const in Config.cc.o
 hdfs::internal::StrToBool(char const*, bool*), referenced from:
 hdfs::Config::getBool(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, bool*) const 
 in Config.cc.o
 hdfs::Config::getBool(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, bool, bool*) 
 const in Config.cc.o
 hdfs::internal::XmlData::handleData(void*, char const*, int) in 
 XmlConfigParser.cc.o
 ld: symbol(s) not found for architecture x86_64
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7562) Fix Atoi.cc link error

2014-12-22 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255651#comment-14255651
 ] 

Binglin Chang commented on HDFS-7562:
-

HDFS-7018 already include this fix, close as duplicate 

 Fix Atoi.cc link error
 --

 Key: HDFS-7562
 URL: https://issues.apache.org/jira/browse/HDFS-7562
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: hdfs-client
Reporter: Binglin Chang
Priority: Trivial
 Attachments: HDFS-7562-pnatve.001.patch


 When compiling, following error occurs:
 {noformat}
 Undefined symbols for architecture x86_64:
 hdfs::internal::StrToInt32(char const*, int*), referenced from:
 hdfs::Config::getInt32(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, int*) const 
 in Config.cc.o
 hdfs::Config::getInt32(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, int, int*) 
 const in Config.cc.o
 hdfs::internal::StrToInt64(char const*, long long*), referenced from:
 hdfs::Config::getInt64(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, long long*) 
 const in Config.cc.o
 hdfs::Config::getInt64(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, long long, 
 long long*) const in Config.cc.o
 hdfs::internal::StrToDouble(char const*, double*), referenced from:
 hdfs::Config::getDouble(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, double*) 
 const in Config.cc.o
 hdfs::Config::getDouble(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, double, 
 double*) const in Config.cc.o
 hdfs::internal::StrToBool(char const*, bool*), referenced from:
 hdfs::Config::getBool(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, bool*) const 
 in Config.cc.o
 hdfs::Config::getBool(std::__1::basic_stringchar, 
 std::__1::char_traitschar, std::__1::allocatorchar  const, bool, bool*) 
 const in Config.cc.o
 hdfs::internal::XmlData::handleData(void*, char const*, int) in 
 XmlConfigParser.cc.o
 ld: symbol(s) not found for architecture x86_64
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6994) libhdfs3 - A native C/C++ HDFS client

2014-12-22 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255683#comment-14255683
 ] 

Binglin Chang commented on HDFS-6994:
-

Hi [~cmccabe] and [~wangzw]
I get some time to work on the current libhdfs3 code,  looks like all the code 
is under hdfs namespace(including code in common/network/rpc), those code is 
useful in native yarn client too(which is in the scope of HADOOP-10388), it is 
better to extract a common module, so hdfs and yarn can both depend on it, 
right?



 libhdfs3 - A native C/C++ HDFS client
 -

 Key: HDFS-6994
 URL: https://issues.apache.org/jira/browse/HDFS-6994
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Reporter: Zhanwei Wang
Assignee: Zhanwei Wang
 Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch


 Hi All
 I just got the permission to open source libhdfs3, which is a native C/C++ 
 HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol.
 libhdfs3 provide the libhdfs style C interface and a C++ interface. Support 
 both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos 
 authentication.
 libhdfs3 is currently used by HAWQ of Pivotal
 I'd like to integrate libhdfs3 into HDFS source code to benefit others.
 You can find libhdfs3 code from github
 https://github.com/PivotalRD/libhdfs3
 http://pivotalrd.github.io/libhdfs3/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-18 Thread Binglin Chang (JIRA)

Binglin Chang created HDFS-7547:
---

 Summary: Fix 
TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
 Key: HDFS-7547
 URL: https://issues.apache.org/jira/browse/HDFS-7547
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-18 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7547:

Description: HDFS-7531 changes the implementation of FsVolumeList, but 
doesn't change it's toString method to keep the old desc string format, test 

 Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
 -

 Key: HDFS-7547
 URL: https://issues.apache.org/jira/browse/HDFS-7547
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's 
 toString method to keep the old desc string format, test 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-18 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7547:

Description: HDFS-7531 changes the implementation of FsVolumeList, but 
doesn't change it's toString method to keep the old desc string format, test 
TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, so 
this test always fails.   (was: HDFS-7531 changes the implementation of 
FsVolumeList, but doesn't change it's toString method to keep the old desc 
string format, test )

 Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
 -

 Key: HDFS-7547
 URL: https://issues.apache.org/jira/browse/HDFS-7547
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's 
 toString method to keep the old desc string format, test 
 TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, 
 so this test always fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-18 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7547:

Status: Patch Available  (was: Open)

 Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
 -

 Key: HDFS-7547
 URL: https://issues.apache.org/jira/browse/HDFS-7547
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's 
 toString method to keep the old desc string format, test 
 TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, 
 so this test always fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7547) Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup

2014-12-18 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7547:

Attachment: HDFS-7547.001.patch

 Fix TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup
 -

 Key: HDFS-7547
 URL: https://issues.apache.org/jira/browse/HDFS-7547
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-7547.001.patch


 HDFS-7531 changes the implementation of FsVolumeList, but doesn't change it's 
 toString method to keep the old desc string format, test 
 TestDataNodeVolumeFailureToleration#testValidVolumesAtStartup depends on it, 
 so this test always fails. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk

2014-12-17 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14251244#comment-14251244
 ] 

Binglin Chang commented on HDFS-7527:
-

Make sense, looks like the behavior is changed at some point. 
Update the patch to partially support dfs.datanode.hostname(if it is an ip 
address, or the hostname resolve to a proper ip address). 
And add change to test to properly wait for the excluded datanode become back 
again(using Datanode.isDatanodeFullyStarted rather than checking ALIVE node 
count).
Note that too fully restore the old behavior requires a lot more changes, 
currently I only made minimal changes.

 TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
 ---

 Key: HDFS-7527
 URL: https://issues.apache.org/jira/browse/HDFS-7527
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang
Assignee: Binglin Chang
 Attachments: HDFS-7527.001.patch


 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 {quote}
 Error Message
 test timed out after 36 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 36 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
 (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
 the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
 datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
 infoSecurePort=0, ipcPort=43726, 
 storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
   at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
   at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
 (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565. Exiting. 
 java.io.IOException: DN shut down before block pool connected
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
   at java.lang.Thread.run(Thread.java:745)
 {quote}
 Found by tool proposed in HADOOP-11045:
 {quote}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 Hadoop-Hdfs-trunk -n 5 | tee bt.log
 Recently FAILED builds in url: 
 https://builds.apache.org//job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
 (2014-12-15 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
 (2014-12-13 10:32:27)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport

[jira] [Updated] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk

2014-12-17 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7527:

Attachment: HDFS-7527.002.patch

 TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
 ---

 Key: HDFS-7527
 URL: https://issues.apache.org/jira/browse/HDFS-7527
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang
Assignee: Binglin Chang
 Attachments: HDFS-7527.001.patch, HDFS-7527.002.patch


 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 {quote}
 Error Message
 test timed out after 36 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 36 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
 (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
 the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
 datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
 infoSecurePort=0, ipcPort=43726, 
 storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
   at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
   at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
 (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565. Exiting. 
 java.io.IOException: DN shut down before block pool connected
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
   at java.lang.Thread.run(Thread.java:745)
 {quote}
 Found by tool proposed in HADOOP-11045:
 {quote}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 Hadoop-Hdfs-trunk -n 5 | tee bt.log
 Recently FAILED builds in url: 
 https://builds.apache.org//job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
 (2014-12-15 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
 (2014-12-13 10:32:27)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport 
 (2014-12-13 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport 
 (2014-12-11 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 Failed test:

[jira] [Commented] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails

2014-12-16 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14247915#comment-14247915
 ] 

Binglin Chang commented on HDFS-7471:
-

The failure is because the datanode is expired, see log:

2014-12-15 12:41:03,938 INFO  blockmanagement.TestDatanodeManager 
(TestDatanodeManager.java:testNumVersionsReportedCorrect(121)) - Registering 
node storageID: someStorageID3896, version: version1, IP address: 
someIPsomeStorageID3896:9000
...
2014-12-15 12:52:29,914 INFO  blockmanagement.TestDatanodeManager 
(TestDatanodeManager.java:testNumVersionsReportedCorrect(121)) - Registering 
node storageID: someStorageID3896, version: version4, IP address: 
someIPsomeStorageID3896:9000

the default expire interval is 10:30s , the datanode someIPsomeStorageID3896 
register at 2014-12-15 12:41:03 and never send heartbeats, after 
11min(2014-12-15 12:52:29) when this datanode reregister, it won't call 
decrementVersionCount, so version count well not match

{code}
  if(shouldCountVersion(nodeS)) {
decrementVersionCount(nodeS.getSoftwareVersion());
  }
{code}

Assuming the currently code logic is right(don't decrementVersionCount when 
datanode is expired), I think the simple fix should just increase expire 
interval.
Or because the datanode is reregistering, maybe it should not marked as expired?



 TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
 -

 Key: HDFS-7471
 URL: https://issues.apache.org/jira/browse/HDFS-7471
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Ted Yu
Assignee: Binglin Chang

 From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ :
 {code}
 FAILED:  
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Error Message:
 The map of version counts returned by DatanodeManager was not what it was 
 expected to be on iteration 237 expected:0 but was:1
 Stack Trace:
 java.lang.AssertionError: The map of version counts returned by 
 DatanodeManager was not what it was expected to be on iteration 237 
 expected:0 but was:1
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails

2014-12-16 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7471:

Target Version/s: 2.7.0
  Status: Patch Available  (was: Open)

 TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
 -

 Key: HDFS-7471
 URL: https://issues.apache.org/jira/browse/HDFS-7471
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Ted Yu
Assignee: Binglin Chang

 From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ :
 {code}
 FAILED:  
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Error Message:
 The map of version counts returned by DatanodeManager was not what it was 
 expected to be on iteration 237 expected:0 but was:1
 Stack Trace:
 java.lang.AssertionError: The map of version counts returned by 
 DatanodeManager was not what it was expected to be on iteration 237 
 expected:0 but was:1
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails

2014-12-16 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7471:

Attachment: HDFS-7471.001.patch

Simple work around to increase expire interval.
After investigate the code, I suspect the current countVersion logic may have 
race conditions, maybe someone more familiar with the code can provide a better 
fix.  

 TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
 -

 Key: HDFS-7471
 URL: https://issues.apache.org/jira/browse/HDFS-7471
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Ted Yu
Assignee: Binglin Chang
 Attachments: HDFS-7471.001.patch


 From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ :
 {code}
 FAILED:  
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Error Message:
 The map of version counts returned by DatanodeManager was not what it was 
 expected to be on iteration 237 expected:0 but was:1
 Stack Trace:
 java.lang.AssertionError: The map of version counts returned by 
 DatanodeManager was not what it was expected to be on iteration 237 
 expected:0 but was:1
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk

2014-12-16 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang reassigned HDFS-7527:
---

Assignee: Binglin Chang

 TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
 ---

 Key: HDFS-7527
 URL: https://issues.apache.org/jira/browse/HDFS-7527
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang
Assignee: Binglin Chang

 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 {quote}
 Error Message
 test timed out after 36 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 36 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
 (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
 the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
 datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
 infoSecurePort=0, ipcPort=43726, 
 storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
   at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
   at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
 (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565. Exiting. 
 java.io.IOException: DN shut down before block pool connected
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
   at java.lang.Thread.run(Thread.java:745)
 {quote}
 Found by tool proposed in HADOOP-11045:
 {quote}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 Hadoop-Hdfs-trunk -n 5 | tee bt.log
 Recently FAILED builds in url: 
 https://builds.apache.org//job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
 (2014-12-15 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
 (2014-12-13 10:32:27)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport 
 (2014-12-13 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport 
 (2014-12-11 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 Failed test:

[jira] [Commented] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk

2014-12-16 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14248411#comment-14248411
 ] 

Binglin Chang commented on HDFS-7527:
-

Read some related code, the test is intended to test dfs.host list can support  
dfs.datanode.hostname (e.g. if you set adatanode's name to host1, and dfs.host 
file contains host1, this datanode should be able to connect to namenode).
But after reading to code, turns out DatanodeManager check dfs.host list only 
using ip address, not hostname(namenode resolve all hostnames in dfs.host file 
to ip address), so this test should fail as the expect behavior. 
The reason the test passes most of the time is because the code is missing 
proper waiting to make sure the old datanode is expired.

{code}
refreshNodes(cluster.getNamesystem(0), hdfsConf);
cluster.restartDataNode(0);
   
// there should be some wait time before the original datanode becoming 
dead, 
// or the following checking code will always success, because old datanode 
is still alive

// Wait for the DN to come back.
while (true) {
  DatanodeInfo info[] = client.datanodeReport(DatanodeReportType.LIVE);
  if (info.length == 1) {
Assert.assertFalse(info[0].isDecommissioned());
Assert.assertFalse(info[0].isDecommissionInProgress());
assertEquals(registrationName, info[0].getHostName());
break;
  }
  LOG.info(Waiting for datanode to come back);
  Thread.sleep(HEARTBEAT_INTERVAL * 1000);
}
{code}

I added some sleep time in the comment above, and the test always fail, which 
verify my theory.
Since the test is not valid, I think we should just remove it.
 



 TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
 ---

 Key: HDFS-7527
 URL: https://issues.apache.org/jira/browse/HDFS-7527
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang
Assignee: Binglin Chang

 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 {quote}
 Error Message
 test timed out after 36 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 36 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
 (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
 the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
 datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
 infoSecurePort=0, ipcPort=43726, 
 storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
   at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
   at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
 (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565. Exiting. 
 java.io.IOException: DN shut down before block pool connected
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
   at java.lang.Thread.run(Thread.java:745)
 {quote}

[jira] [Updated] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk

2014-12-16 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7527:

Target Version/s: 2.7.0
  Status: Patch Available  (was: Open)

 TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
 ---

 Key: HDFS-7527
 URL: https://issues.apache.org/jira/browse/HDFS-7527
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang
Assignee: Binglin Chang

 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 {quote}
 Error Message
 test timed out after 36 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 36 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
 (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
 the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
 datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
 infoSecurePort=0, ipcPort=43726, 
 storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
   at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
   at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
 (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565. Exiting. 
 java.io.IOException: DN shut down before block pool connected
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
   at java.lang.Thread.run(Thread.java:745)
 {quote}
 Found by tool proposed in HADOOP-11045:
 {quote}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 Hadoop-Hdfs-trunk -n 5 | tee bt.log
 Recently FAILED builds in url: 
 https://builds.apache.org//job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
 (2014-12-15 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
 (2014-12-13 10:32:27)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport 
 (2014-12-13 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport 
 (2014-12-11 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 Failed test:

[jira] [Updated] (HDFS-7527) TestDecommission.testIncludeByRegistrationName fails occassionally in trunk

2014-12-16 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-7527:

Attachment: HDFS-7527.001.patch

 TestDecommission.testIncludeByRegistrationName fails occassionally in trunk
 ---

 Key: HDFS-7527
 URL: https://issues.apache.org/jira/browse/HDFS-7527
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang
Assignee: Binglin Chang
 Attachments: HDFS-7527.001.patch


 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 {quote}
 Error Message
 test timed out after 36 milliseconds
 Stacktrace
 java.lang.Exception: test timed out after 36 milliseconds
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName(TestDecommission.java:957)
 2014-12-15 12:00:19,958 ERROR datanode.DataNode 
 (BPServiceActor.java:run(836)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565 Datanode denied communication with namenode because 
 the host is not in the include-list: DatanodeRegistration(127.0.0.1, 
 datanodeUuid=55d8cbff-d8a3-4d6d-ab64-317fff0ee279, infoPort=54318, 
 infoSecurePort=0, ipcPort=43726, 
 storageInfo=lv=-56;cid=testClusterID;nsid=903754315;c=0)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:4402)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1196)
   at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:92)
   at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:26296)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:637)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:966)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2127)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2123)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1669)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2121)
 2014-12-15 12:00:29,087 FATAL datanode.DataNode 
 (BPServiceActor.java:run(841)) - Initialization failed for Block pool 
 BP-887397778-67.195.81.153-1418644469024 (Datanode Uuid null) service to 
 localhost/127.0.0.1:40565. Exiting. 
 java.io.IOException: DN shut down before block pool connected
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.retrieveNamespaceInfo(BPServiceActor.java:186)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:216)
   at 
 org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829)
   at java.lang.Thread.run(Thread.java:745)
 {quote}
 Found by tool proposed in HADOOP-11045:
 {quote}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 Hadoop-Hdfs-trunk -n 5 | tee bt.log
 Recently FAILED builds in url: 
 https://builds.apache.org//job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
 (2014-12-15 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
 (2014-12-13 10:32:27)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport 
 (2014-12-13 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport 
 (2014-12-11 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 Failed test:

[jira] [Assigned] (HDFS-7471) TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails

2014-12-15 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang reassigned HDFS-7471:
---

Assignee: Binglin Chang

 TestDatanodeManager#testNumVersionsReportedCorrect occasionally fails
 -

 Key: HDFS-7471
 URL: https://issues.apache.org/jira/browse/HDFS-7471
 Project: Hadoop HDFS
  Issue Type: Test
  Components: test
Affects Versions: 3.0.0
Reporter: Ted Yu
Assignee: Binglin Chang

 From https://builds.apache.org/job/Hadoop-Hdfs-trunk/1957/ :
 {code}
 FAILED:  
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Error Message:
 The map of version counts returned by DatanodeManager was not what it was 
 expected to be on iteration 237 expected:0 but was:1
 Stack Trace:
 java.lang.AssertionError: The map of version counts returned by 
 DatanodeManager was not what it was expected to be on iteration 237 
 expected:0 but was:1
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:555)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HDFS-7525) TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in trunk

2014-12-15 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang resolved HDFS-7525.
-
Resolution: Duplicate

 TestDatanodeManager.testNumVersionsReportedCorrect fails occassionally in 
 trunk
 ---

 Key: HDFS-7525
 URL: https://issues.apache.org/jira/browse/HDFS-7525
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode, test
Reporter: Yongjun Zhang

 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport/
 {quote}
 Error Message
 The map of version counts returned by DatanodeManager was not what it was 
 expected to be on iteration 484 expected:0 but was:1
 Stacktrace
 java.lang.AssertionError: The map of version counts returned by 
 DatanodeManager was not what it was expected to be on iteration 484 
 expected:0 but was:1
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:555)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect(TestDatanodeManager.java:150)
 {quote}
 Found by tool proposed in HADOOP-11045:
 {quote}
 [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j 
 Hadoop-Hdfs-trunk -n 5 | tee bt.log
 Recently FAILED builds in url: 
 https://builds.apache.org//job/Hadoop-Hdfs-trunk
 THERE ARE 4 builds (out of 6) that have failed tests in the past 5 days, 
 as listed below:
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1974/testReport 
 (2014-12-15 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1972/testReport 
 (2014-12-13 10:32:27)
 Failed test: 
 org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1971/testReport 
 (2014-12-13 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 ===https://builds.apache.org/job/Hadoop-Hdfs-trunk/1969/testReport 
 (2014-12-11 03:30:01)
 Failed test: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 Failed test: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
 Among 6 runs examined, all failed tests #failedRuns: testName:
 3: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA.testUpdatePipeline
 2: org.apache.hadoop.hdfs.TestDecommission.testIncludeByRegistrationName
 2: 
 org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testNumVersionsReportedCorrect
 1: 
 org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover.testFailoverRightBeforeCommitSynchronization
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6308) TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky

2014-12-09 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6308:

Target Version/s: 2.7.0

 TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky
 

 Key: HDFS-6308
 URL: https://issues.apache.org/jira/browse/HDFS-6308
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6308.v1.patch


 Found this on pre-commit build of HDFS-6261
 {code}
 java.lang.AssertionError: Expected one valid and one invalid volume
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-5574) Remove buffer copy in BlockReader.skip

2014-12-09 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-5574:

Attachment: HDFS-5574.006.patch

rebase patch to trunk

 Remove buffer copy in BlockReader.skip
 --

 Key: HDFS-5574
 URL: https://issues.apache.org/jira/browse/HDFS-5574
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HDFS-5574.006.patch, HDFS-5574.v1.patch, 
 HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch


 BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read 
 data to this buffer, it is not necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-5574) Remove buffer copy in BlockReader.skip

2014-12-09 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-5574:

Target Version/s: 2.7.0

 Remove buffer copy in BlockReader.skip
 --

 Key: HDFS-5574
 URL: https://issues.apache.org/jira/browse/HDFS-5574
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HDFS-5574.006.patch, HDFS-5574.v1.patch, 
 HDFS-5574.v2.patch, HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch


 BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read 
 data to this buffer, it is not necessary. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-4165) Faulty sanity check in FsDirectory.unprotectedSetQuota

2014-09-25 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14148805#comment-14148805
 ] 

Binglin Chang commented on HDFS-4165:
-

I think the change is simple and it is OK to merge to branch-2. 

 Faulty sanity check in FsDirectory.unprotectedSetQuota
 --

 Key: HDFS-4165
 URL: https://issues.apache.org/jira/browse/HDFS-4165
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Fix For: 3.0.0

 Attachments: HDFS-4165.patch


 According to the documentation:
 The quota can have three types of values : (1) 0 or more will set 
 the quota to that value, (2) {@link HdfsConstants#QUOTA_DONT_SET}  implies 
 the quota will not be changed, and (3) {@link HdfsConstants#QUOTA_RESET} 
 implies the quota will be reset. Any other value is a runtime error.
 sanity check in FsDirectory.unprotectedSetQuota should use 
 {code}
 nsQuota != HdfsConstants.QUOTA_RESET
 {code}
 rather than
 {code}
 nsQuota  HdfsConstants.QUOTA_RESET
 {code}
 Since HdfsConstants.QUOTA_RESET is defined to be -1, there is not any problem 
 for this code, but it is better to do it right.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-6534) Fix build on macosx: HDFS parts

2014-09-23 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6534:

Attachment: HDFS-6534.v3.patch

Thanks for the notice Allen. Attach new version of the patch.

 Fix build on macosx: HDFS parts
 ---

 Key: HDFS-6534
 URL: https://issues.apache.org/jira/browse/HDFS-6534
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
 Attachments: HDFS-6534.v1.patch, HDFS-6534.v2.patch, 
 HDFS-6534.v3.patch


 When compiling native code on macosx using clang, compiler find more warning 
 and errors which gcc ignores, those should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer

2014-09-09 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127971#comment-14127971
 ] 

Binglin Chang commented on HDFS-6506:
-

Thanks for the review Chris and Junping.

 Newly moved block replica been invalidated and deleted in TestBalancer
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer, test
Reporter: Binglin Chang
Assignee: Binglin Chang
 Fix For: 2.6.0

 Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch, 
 HDFS-6506.v3.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think

[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer

2014-09-07 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6506:

Attachment: HDFS-6506.v3.patch

Rebase patch to lastest trunk

 Newly moved block replica been invalidated and deleted in TestBalancer
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch, 
 HDFS-6506.v3.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused 
 this. 



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HDFS-5574) Remove buffer copy in BlockReader.skip

2014-07-14 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061607#comment-14061607
 ] 

Binglin Chang commented on HDFS-5574:
-

Hi [~cmccabe], looks like there are no more comments for a long time, could you 
help get this committed? Thanks:)

 Remove buffer copy in BlockReader.skip
 --

 Key: HDFS-5574
 URL: https://issues.apache.org/jira/browse/HDFS-5574
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HDFS-5574.v1.patch, HDFS-5574.v2.patch, 
 HDFS-5574.v3.patch, HDFS-5574.v4.patch, HDFS-5574.v5.patch


 BlockReaderLocal.skip and RemoteBlockReader.skip uses a temp buffer to read 
 data to this buffer, it is not necessary. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer

2014-07-14 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6506:


Summary: Newly moved block replica been invalidated and deleted in 
TestBalancer  (was: Newly moved block replica been invalidated and deleted)

 Newly moved block replica been invalidated and deleted in TestBalancer
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup

[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted in TestBalancer

2014-07-14 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061612#comment-14061612
 ] 

Binglin Chang commented on HDFS-6506:
-

Hi [~djp], this bug is related to TestBalancerWithNodeGroup, could you help 
review this? Thanks:)

 Newly moved block replica been invalidated and deleted in TestBalancer
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think

[jira] [Commented] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk

2014-06-23 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040606#comment-14040606
 ] 

Binglin Chang commented on HDFS-6586:
-

TestBalancerWithNodeGroup also failed before with the same reason(HDFS-6250), 
we fixed TestBalancerWithNodeGroup, but looks like TestBalancer have the same 
bug, and potentially also have bug HDFS-6250.

 TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
 ---

 Key: HDFS-6586
 URL: https://issues.apache.org/jira/browse/HDFS-6586
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/
  :
 {code}
 Stacktrace
 java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
 to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more 
 than 4 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk

2014-06-23 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040607#comment-14040607
 ] 

Binglin Chang commented on HDFS-6586:
-

bq. and potentially also have bug HDFS-6250
sorry, it was HDFS-6506

 TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
 ---

 Key: HDFS-6586
 URL: https://issues.apache.org/jira/browse/HDFS-6586
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/
  :
 {code}
 Stacktrace
 java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
 to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more 
 than 4 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-23 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6506:


Attachment: HDFS-6506.v2.patch

Update patch to add fix of bug in HDFS-6586, TestBalancer is affected by 
balancer.id file.

 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch, HDFS-6506.v2.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused 
 this. 



--
This message was sent by

[jira] [Resolved] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk

2014-06-23 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang resolved HDFS-6586.
-

Resolution: Duplicate

 TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
 ---

 Key: HDFS-6586
 URL: https://issues.apache.org/jira/browse/HDFS-6586
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/
  :
 {code}
 Stacktrace
 java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
 to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more 
 than 4 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6586) TestBalancer#testExitZeroOnSuccess sometimes fails in trunk

2014-06-23 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040625#comment-14040625
 ] 

Binglin Chang commented on HDFS-6586:
-

I updated the patch in HDSF-6506 to fix the bug, close this jira as duplicate. 
Thanks for reporting this, Ted.

 TestBalancer#testExitZeroOnSuccess sometimes fails in trunk
 ---

 Key: HDFS-6586
 URL: https://issues.apache.org/jira/browse/HDFS-6586
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Ted Yu
Priority: Minor

 From 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1782/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancer/testExitZeroOnSuccess/
  :
 {code}
 Stacktrace
 java.util.concurrent.TimeoutException: Rebalancing expected avg utilization 
 to become 0.2, but on datanode 127.0.0.1:49048 it remains at 0.08 after more 
 than 4 msec.
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.waitForBalancer(TestBalancer.java:284)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.runBalancerCli(TestBalancer.java:392)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.doTest(TestBalancer.java:357)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.oneNodeTest(TestBalancer.java:398)
   at 
 org.apache.hadoop.hdfs.server.balancer.TestBalancer.testExitZeroOnSuccess(TestBalancer.java:550)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-4667) Capture renamed files/directories in snapshot diff report

2014-06-20 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14039667#comment-14039667
 ] 

Binglin Chang commented on HDFS-4667:
-

bq. Binglin Chang, do you have any other comments?
No, thanks for the patch, lgtm

 Capture renamed files/directories in snapshot diff report
 -

 Key: HDFS-4667
 URL: https://issues.apache.org/jira/browse/HDFS-4667
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-4667.002.patch, HDFS-4667.002.patch, 
 HDFS-4667.003.patch, HDFS-4667.004.patch, HDFS-4667.demo.patch, 
 HDFS-4667.v1.patch, getfullname-snapshot-support.patch


 Currently in the diff report we only show file/dir creation, deletion and 
 modification. After rename with snapshots is supported, renamed file/dir 
 should also be captured in the diff report.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6534) Fix build on macosx: HDFS parts

2014-06-17 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6534:


Attachment: HDFS-6534.v2.patch

FIx a minor typo causing linux build failed

 Fix build on macosx: HDFS parts
 ---

 Key: HDFS-6534
 URL: https://issues.apache.org/jira/browse/HDFS-6534
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
 Attachments: HDFS-6534.v1.patch, HDFS-6534.v2.patch


 When compiling native code on macosx using clang, compiler find more warning 
 and errors which gcc ignores, those should be fixed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6534) Fix build on macosx: HDFS parts

2014-06-16 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6534:


Status: Patch Available  (was: Open)

 Fix build on macosx: HDFS parts
 ---

 Key: HDFS-6534
 URL: https://issues.apache.org/jira/browse/HDFS-6534
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor

 When compiling native code on macosx using clang, compiler find more warning 
 and errors which gcc ignores, those should be fixed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6534) Fix build on macosx: HDFS parts

2014-06-16 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6534:


Attachment: HDFS-6534.v1.patch

Changes:
1. fix bug in memset(hdfsFileInfo...)
2. use PRId64 instead of %ld to prevent compile warning
3. emulate clock_gettime/sem_init/sem_destroy on macosx
4. remove -lrt on macosx in CMakeLists.txt


 Fix build on macosx: HDFS parts
 ---

 Key: HDFS-6534
 URL: https://issues.apache.org/jira/browse/HDFS-6534
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
 Attachments: HDFS-6534.v1.patch


 When compiling native code on macosx using clang, compiler find more warning 
 and errors which gcc ignores, those should be fixed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml

2014-06-16 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032152#comment-14032152
 ] 

Binglin Chang commented on HDFS-6539:
-

The failed test is not related, create  HDFS-6541 to track this

 test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
 --

 Key: HDFS-6539
 URL: https://issues.apache.org/jira/browse/HDFS-6539
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6539.v1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6541) TestWebHdfsWithMultipleNameNodes.testRedirect failed with read timeout

2014-06-16 Thread Binglin Chang (JIRA)

Binglin Chang created HDFS-6541:
---

 Summary: TestWebHdfsWithMultipleNameNodes.testRedirect failed with 
read timeout
 Key: HDFS-6541
 URL: https://issues.apache.org/jira/browse/HDFS-6541
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang


https://builds.apache.org/job/PreCommit-HDFS-Build/7124/testReport/junit/org.apache.hadoop.hdfs.web/TestWebHdfsWithMultipleNameNodes/testRedirect/

Error Message

Read timed out
Stacktrace

java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:695)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:472)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:539)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:410)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:438)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:434)
at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.create(WebHdfsFileSystem.java:1049)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:906)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:887)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:784)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:773)
at 
org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes.testRedirect(TestWebHdfsWithMultipleNameNodes.java:130)




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-4667) Capture renamed files/directories in snapshot diff report

2014-06-16 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032272#comment-14032272
 ] 

Binglin Chang commented on HDFS-4667:
-

Thanks for the updates [~jingzhao], I will have a look, it may take 1 or 2.

 Capture renamed files/directories in snapshot diff report
 -

 Key: HDFS-4667
 URL: https://issues.apache.org/jira/browse/HDFS-4667
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, namenode
Reporter: Jing Zhao
Assignee: Binglin Chang
 Attachments: HDFS-4667.002.patch, HDFS-4667.002.patch, 
 HDFS-4667.003.patch, HDFS-4667.demo.patch, HDFS-4667.v1.patch, 
 getfullname-snapshot-support.patch


 Currently in the diff report we only show file/dir creation, deletion and 
 modification. After rename with snapshots is supported, renamed file/dir 
 should also be captured in the diff report.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml

2014-06-15 Thread Binglin Chang (JIRA)

Binglin Chang created HDFS-6539:
---

 Summary: test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
 Key: HDFS-6539
 URL: https://issues.apache.org/jira/browse/HDFS-6539
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml

2014-06-15 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang reassigned HDFS-6539:
---

Assignee: Binglin Chang

 test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
 --

 Key: HDFS-6539
 URL: https://issues.apache.org/jira/browse/HDFS-6539
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml

2014-06-15 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6539:


Status: Patch Available  (was: Open)

 test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
 --

 Key: HDFS-6539
 URL: https://issues.apache.org/jira/browse/HDFS-6539
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang





--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6539) test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml

2014-06-15 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6539:


Attachment: HDFS-6539.v1.patch

Hi [~cmccabe],
 looks like the patch in HADOOP-8480 has a little error, test_native_mini_dfs 
was changed to test_libhdfs_threaded, so test_libhdfs_threaded was tested 
twice, but  test_native_mini_dfs is skipped.

 test_native_mini_dfs is skipped in hadoop-hdfs/pom.xml
 --

 Key: HDFS-6539
 URL: https://issues.apache.org/jira/browse/HDFS-6539
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6539.v1.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Moved] (HDFS-6534) Fix build on macosx: HDFS parts

2014-06-14 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang moved HADOOP-10700 to HDFS-6534:
--

Key: HDFS-6534  (was: HADOOP-10700)
Project: Hadoop HDFS  (was: Hadoop Common)

 Fix build on macosx: HDFS parts
 ---

 Key: HDFS-6534
 URL: https://issues.apache.org/jira/browse/HDFS-6534
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor

 When compiling native code on macosx using clang, compiler find more warning 
 and errors which gcc ignores, those should be fixed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang reassigned HDFS-6506:
---

Assignee: Binglin Chang

 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused 
 this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026188#comment-14026188
 ] 

Binglin Chang commented on HDFS-6506:
-

Look at the log and code more throughly. The reason some block replica is 
invalidated is:
1. balancer round 1: move blk0 from dn0 to dn1, at this time block map haven't 
updated yet(so dn0 still have blk0)
2. balancer round 2 starts, and try to move blk0 from dn0 to dn2
3. dn2 copy data from dn0 
4. dn0 heartbeat and get cmd to delete blk0
5. try to move blk0 from dn0 to dn2 , it canot find dn0, but it has to delete a 
replica, so it delete dn1

To prevent this, balancer need to wait some time to make sure the block 
movements in last round is fully committed, otherwise the movements in last 
round may be invalided.



 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007,

[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026215#comment-14026215
 ] 

Binglin Chang commented on HDFS-6506:
-

Balancer already sleep 2*DFS_HEARTBEAT_INTERVAL seconds between rounds, but in 
TestBalancer.java:
{code}
conf.setLong(DFSConfigKeys.DFS_HEARTBEAT_INTERVAL_KEY, 1L);
{code}
replica state update speed is related to DFS_NAMENODE_REPLICATION_INTERVAL too, 
which is 3 by default.
TestBalancer only change heartbeat interval(which changes heartbeat interval 
and balancer iteration sleep time), but doesn't change ReplicationMonitor check 
interval, so the sleep time is too small to wait for movements getting 
committed.
The other thing is 2*DFS_HEARTBEAT_INTERVAL still seems a little dangerous. 
maybe change it to 2*DFS_HEARTBEAT_INTERVAL + DFS_NAMENODE_REPLICATION_INTERVAL


 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003,

[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6506:


Attachment: HDFS-6506.v1.patch

 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused 
 this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6506:


Status: Patch Available  (was: Open)

 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused 
 this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-10 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026512#comment-14026512
 ] 

Binglin Chang commented on HDFS-6506:
-

The failed test is not related and is tracked in HDFS-3930, actually recent 
build also failed because of this.
https://builds.apache.org/job/Hadoop-Hdfs-trunk/1770/consoleText

 Newly moved block replica been invalidated and deleted
 --

 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6506.v1.patch


 TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
 https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
 from the error log, the reason seems to be that newly moved block replicas 
 been invalidated and deleted, so some work of the balancer are reversed.
 {noformat}
 2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 
 to 127.0.0.1:55468 through 127.0.0.1:49159
 2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) 
 - Successfully moved blk_1073741829_1005 with size=100 fr
 2014-06-06 18:15:54,706 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
 invalidated blocks set
 2014-06-06 18:15:54,709 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
 invalidated blocks set
 2014-06-06 18:15:56,421 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
 2014-06-06 18:15:57,717 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,720 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,721 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,722 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
 invalidated blocks set
 2014-06-06 18:15:57,723 INFO  BlockStateChange 
 (BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
 chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
 invalidated blocks set
 2014-06-06 18:15:59,422 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
 blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
 2014-06-06 18:16:02,423 INFO  BlockStateChange 
 (BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
 127.0.0.1:55468 to delete [blk_1073741845_1021]
 {noformat}
 Normally this should not happen, when moving a block from src to dest, 
 replica on src should be invalided not the dest, there should be bug inside 
 related logic. 
 I don't think

[jira] [Created] (HDFS-6506) Newly moved block replica been invalidated and deleted

2014-06-09 Thread Binglin Chang (JIRA)

Binglin Chang created HDFS-6506:
---

 Summary: Newly moved block replica been invalidated and deleted
 Key: HDFS-6506
 URL: https://issues.apache.org/jira/browse/HDFS-6506
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang


TestBalancerWithNodeGroup#testBalancerWithNodeGroup fails recently
https://builds.apache.org/job/PreCommit-HDFS-Build/7045//testReport/
from the error log, the reason seems to be that newly moved block replicas been 
invalidated and deleted, so some work of the balancer are reversed.
{noformat}
2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741829_1005 with size=100 fr
2014-06-06 18:15:54,706 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
invalidated blocks set
2014-06-06 18:15:54,709 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
invalidated blocks set
2014-06-06 18:15:56,421 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
2014-06-06 18:15:57,717 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
invalidated blocks set
2014-06-06 18:15:57,720 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
invalidated blocks set
2014-06-06 18:15:57,721 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
invalidated blocks set
2014-06-06 18:15:57,722 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
invalidated blocks set
2014-06-06 18:15:57,723 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
invalidated blocks set
2014-06-06 18:15:59,422 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
2014-06-06 18:16:02,423 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
127.0.0.1:55468 to delete [blk_1073741845_1021]
{noformat}
Normally this should not happen, when moving a block from src to dest, replica 
on src should be invalided not the dest, there should be bug inside related 
logic. 
I don't think TestBalancerWithNodeGroup#testBalancerWithNodeGroup caused this. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success

2014-06-09 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026093#comment-14026093
 ] 

Binglin Chang commented on HDFS-6159:
-

Thanks [~djp] and [~arpitagarwal] for the comments, I create  HDFS-6506 to 
track this. 

 TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block 
 missing after balancer success
 --

 Key: HDFS-6159
 URL: https://issues.apache.org/jira/browse/HDFS-6159
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6159-v2.patch, HDFS-6159-v2.patch, HDFS-6159.patch, 
 logs.txt


 The TestBalancerWithNodeGroup.testBalancerWithNodeGroup will report negative 
 false failure if there is(are) data block(s) losing after balancer 
 successfuly finishes. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success

2014-06-08 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021553#comment-14021553
 ] 

Binglin Chang commented on HDFS-6159:
-

The test error log:
Rebalancing expected avg utilization to become 0.16, but on datanode 
127.0.0.1:55468 it remains at 0.02 after more than 4 msec.
bug from the balancer log:
{noformat}
2014-06-06 18:15:51,681 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741834_1010 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741833_1009 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741830_1006 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,683 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741831_1007 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:51,682 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741832_1008 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741827_1003 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:54,702 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741828_1004 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
2014-06-06 18:15:54,701 INFO  balancer.Balancer (Balancer.java:dispatch(370)) - 
Successfully moved blk_1073741829_1005 with size=100 from 127.0.0.1:49159 to 
127.0.0.1:55468 through 127.0.0.1:49159
We can see than there are 8 blocks(800 bytes) moved to 127.0.0.1:55468, the 
utilization of this datanode should be 0.16(800/5000).
{noformat}
But at the same time, those blocks are deleted by block manager:
{noformat}
2014-06-06 18:15:54,706 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741833_1009) is added to 
invalidated blocks set
2014-06-06 18:15:54,709 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741834_1010) is added to 
invalidated blocks set
2014-06-06 18:15:56,421 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
127.0.0.1:55468 to delete [blk_1073741833_1009, blk_1073741834_1010]
2014-06-06 18:15:57,717 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741832_1008) is added to 
invalidated blocks set
2014-06-06 18:15:57,720 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741827_1003) is added to 
invalidated blocks set
2014-06-06 18:15:57,721 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741830_1006) is added to 
invalidated blocks set
2014-06-06 18:15:57,722 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741831_1007) is added to 
invalidated blocks set
2014-06-06 18:15:57,723 INFO  BlockStateChange 
(BlockManager.java:chooseExcessReplicates(2711)) - BLOCK* 
chooseExcessReplicates: (127.0.0.1:55468, blk_1073741829_1005) is added to 
invalidated blocks set
2014-06-06 18:15:59,422 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
127.0.0.1:55468 to delete [blk_1073741827_1003, blk_1073741829_1005, 
blk_1073741830_1006, blk_1073741831_1007, blk_1073741832_1008]
2014-06-06 18:16:02,423 INFO  BlockStateChange 
(BlockManager.java:invalidateWorkForOneNode(3242)) - BLOCK* BlockManager: ask 
127.0.0.1:55468 to delete [blk_1073741845_1021]
{noformat}
At last, only block blk_1073741828_1004 is left on 127.0.0.1:55468, so the 
final utilizition is(100/5000=0.02)
Those blocks are newly moved by balancer and should not be invalidated by block 
manager. Most likely some logic in BlockManager.java invalidate blocks is 
broken? Look at the svn log, recently there are some change in BlockManager 
invalidate block. (HDFS-6424, HDFS-6362).
Perhaps [~jingzhao] or [~arpitagarwal] can help look at this? 

 TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block 
 missing after balancer success

[jira] [Created] (HDFS-6417) TestTransferFsImage.testClientSideException is flaky

2014-05-16 Thread Binglin Chang (JIRA)

Binglin Chang created HDFS-6417:
---

 Summary: TestTransferFsImage.testClientSideException is flaky
 Key: HDFS-6417
 URL: https://issues.apache.org/jira/browse/HDFS-6417
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Priority: Minor
 Attachments: HDFS-6417.log

Looks like http connection to NN timeout.

https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/testReport/org.apache.hadoop.hdfs.server.namenode/TestTransferFsImage/testClientSideException/

Error Message

Wanted but not invoked:
nNStorage.reportErrorOnFile(
/x-does-not-exist/blah
);
- at 
org.apache.hadoop.hdfs.server.namenode.TestTransferFsImage.testClientSideException(TestTransferFsImage.java:80)
Actually, there were zero interactions with this mock.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6417) TestTransferFsImage.testClientSideException is flaky

2014-05-16 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6417:


Attachment: HDFS-6417.log

 TestTransferFsImage.testClientSideException is flaky
 

 Key: HDFS-6417
 URL: https://issues.apache.org/jira/browse/HDFS-6417
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Priority: Minor
 Attachments: HDFS-6417.log


 Looks like http connection to NN timeout.
 https://builds.apache.org/job/Hadoop-Hdfs-trunk/1753/testReport/org.apache.hadoop.hdfs.server.namenode/TestTransferFsImage/testClientSideException/
 Error Message
 Wanted but not invoked:
 nNStorage.reportErrorOnFile(
 /x-does-not-exist/blah
 );
 - at 
 org.apache.hadoop.hdfs.server.namenode.TestTransferFsImage.testClientSideException(TestTransferFsImage.java:80)
 Actually, there were zero interactions with this mock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6381) Fix a typo in INodeReference.java

2014-05-15 Thread Binglin Chang (JIRA)

Binglin Chang created HDFS-6381:
---

 Summary: Fix a typo in INodeReference.java
 Key: HDFS-6381
 URL: https://issues.apache.org/jira/browse/HDFS-6381
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial


hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeReference.java
{code}
  * For example,
- * (1) Support we have /abc/foo, say the inode of foo is 
inode(id=1000,name=foo)
+ * (1) Suppose we have /abc/foo, say the inode of foo is 
inode(id=1000,name=foo)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails

2014-05-15 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998419#comment-13998419
 ] 

Binglin Chang commented on HDFS-6250:
-

The problem with adding an datanode on both rack0 and rack1 is that, you can't 
verify there is no block movement cross rack by checking datanode dfs usage. 
some block may move from rack0 to rack1, and some may move from rack1 to rack0, 
after this, the total rack usage may not change. 


 TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
 

 Key: HDFS-6250
 URL: https://issues.apache.org/jira/browse/HDFS-6250
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, 
 test_log.txt


 It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/
 {panel}
 java.lang.AssertionError: expected:1800 but was:1810
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails

2014-05-15 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998417#comment-13998417
 ] 

Binglin Chang commented on HDFS-6250:
-

Hi [~airbots], what Junping points out is correct, the blocks in test file 
should never move to rack1 whether balancer.id is large or small, it is for 
reliability, if a replica in rack0 moved to rack1, all the replicas for a block 
will be in rack1, if rack1 is down, we get a missing block


 TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
 

 Key: HDFS-6250
 URL: https://issues.apache.org/jira/browse/HDFS-6250
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, 
 test_log.txt


 It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/
 {panel}
 java.lang.AssertionError: expected:1800 but was:1810
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails

2014-05-14 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997229#comment-13997229
 ] 

Binglin Chang commented on HDFS-6250:
-

Hi Junping,
run TestBalancerWithNodeGroup.testBalancerWithRackLocality 50 times, no failure 
or timeout, average running time 5.2s, before was 20s
run TestBalancerWithNodeGroup.testBalancerWithNodeGroup 50 times, no failure or 
timeout, average running time 10.2s, before was 17s


 TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
 

 Key: HDFS-6250
 URL: https://issues.apache.org/jira/browse/HDFS-6250
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, 
 test_log.txt


 It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/
 {panel}
 java.lang.AssertionError: expected:1800 but was:1810
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6381) Fix a typo in INodeReference.java

2014-05-13 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6381:


Status: Patch Available  (was: Open)

 Fix a typo in INodeReference.java
 -

 Key: HDFS-6381
 URL: https://issues.apache.org/jira/browse/HDFS-6381
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HDFS-6381.v1.patch


 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeReference.java
 {code}
   * For example,
 - * (1) Support we have /abc/foo, say the inode of foo is 
 inode(id=1000,name=foo)
 + * (1) Suppose we have /abc/foo, say the inode of foo is 
 inode(id=1000,name=foo)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6381) Fix a typo in INodeReference.java

2014-05-13 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6381:


Attachment: HDFS-6381.v1.patch

 Fix a typo in INodeReference.java
 -

 Key: HDFS-6381
 URL: https://issues.apache.org/jira/browse/HDFS-6381
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Trivial
 Attachments: HDFS-6381.v1.patch


 hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodeReference.java
 {code}
   * For example,
 - * (1) Support we have /abc/foo, say the inode of foo is 
 inode(id=1000,name=foo)
 + * (1) Suppose we have /abc/foo, say the inode of foo is 
 inode(id=1000,name=foo)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6159) TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block missing after balancer success

2014-05-06 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990433#comment-13990433
 ] 

Binglin Chang commented on HDFS-6159:
-

The fix in the patch has some issue:
bq. I propose to increase datanode capacity up to 6000B and data block size to 
100B.
{code}
  static final int DEFAULT_BLOCK_SIZE = 100;
{code}
this variable is not used anywhere, change it does not change block size, hence 
capacity is changed to 6000, block size remains 10 bytes actually leads more 
blocks needs to be moved, hence increase the total balancer running time, more 
likely to cause timeout.


 TestBalancerWithNodeGroup.testBalancerWithNodeGroup fails if there is block 
 missing after balancer success
 --

 Key: HDFS-6159
 URL: https://issues.apache.org/jira/browse/HDFS-6159
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.3.0
Reporter: Chen He
Assignee: Chen He
 Fix For: 3.0.0, 2.5.0

 Attachments: HDFS-6159-v2.patch, HDFS-6159-v2.patch, HDFS-6159.patch, 
 logs.txt


 The TestBalancerWithNodeGroup.testBalancerWithNodeGroup will report negative 
 false failure if there is(are) data block(s) losing after balancer 
 successfuly finishes. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails

2014-05-06 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990440#comment-13990440
 ] 

Binglin Chang commented on HDFS-6250:
-

Hi [~airbots], please see my comments about HDFS-6159
https://issues.apache.org/jira/browse/HDFS-6159?focusedCommentId=13990433page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13990433
The fix in HDFS-6159 and this jira seems to be suboptimal, we may need to 
reconsider the approach.

 TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
 

 Key: HDFS-6250
 URL: https://issues.apache.org/jira/browse/HDFS-6250
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6250-v2.patch, HDFS-6250.patch, test_log.txt


 It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/
 {panel}
 java.lang.AssertionError: expected:1800 but was:1810
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails

2014-05-06 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990491#comment-13990491
 ] 

Binglin Chang commented on HDFS-6250:
-

I maded a patch to address this jira and HDFS-6159, along with a minor fix in 
balancer.id ralated doc, changes:
1. make CAPACITY to 5000 rather than 6000, so it remains same ratio to block 
size as before, make DEFAULT_BLOCK_SIZE useful
2. change validate method in testBalancerWithRackLocality so it doesn't depends 
on balancer.id file
3. there is a doc error about  balancer.id, fixed
With the change the test now runs only about 7 seconds, rather than 20+ seconds.

 TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
 

 Key: HDFS-6250
 URL: https://issues.apache.org/jira/browse/HDFS-6250
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, 
 test_log.txt


 It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/
 {panel}
 java.lang.AssertionError: expected:1800 but was:1810
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails

2014-05-06 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6250:


Attachment: HDFS-6250-v3.patch

 TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
 

 Key: HDFS-6250
 URL: https://issues.apache.org/jira/browse/HDFS-6250
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments: HDFS-6250-v2.patch, HDFS-6250-v3.patch, HDFS-6250.patch, 
 test_log.txt


 It was seen in https://builds.apache.org/job/PreCommit-HDFS-Build/6669/
 {panel}
 java.lang.AssertionError: expected:1800 but was:1810
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.failNotEquals(Assert.java:647)
   at org.junit.Assert.assertEquals(Assert.java:128)
   at org.junit.Assert.assertEquals(Assert.java:147)
   at org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup
  .testBalancerWithRackLocality(TestBalancerWithNodeGroup.java:253)
 {panel}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge

2014-05-06 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990668#comment-13990668
 ] 

Binglin Chang commented on HDFS-6342:
-

Hi [~airbots], I agree with not changing balancer.id file, my new patch doesn't 
change it, pls see my new comments in HDFS-6250.

 TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if 
 balancer.id file is huge
 ---

 Key: HDFS-6342
 URL: https://issues.apache.org/jira/browse/HDFS-6342
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Chen He
Assignee: Chen He
 Attachments: HDFS-6342.patch


 The testBalancerWithRackLocality mehtod is to test balancer moving data 
 blocks with rack locality consideration. 
 It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother 
 node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size 
 is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It 
 create 180 data blocks with replication factor 2. Then, a node datanode is 
 created (in rack1nodeGroup2) and balancer starts to balancing the cluster.
 It expects there is only data blocks moving within rack1. After balancer is 
 done, it assumes the data size on both racks is the same. It will break
 if balancer.id file is huge and there is inter-rack data block moving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6250) TestBalancerWithNodeGroup.testBalancerWithRackLocality fails

2014-05-05 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13989495#comment-13989495
 ] 

Binglin Chang commented on HDFS-6250:
-

Thanks for the analysis and patch [~airbots]. The fix makes sense,  here are 
some additional concerns:

bq. HDFS creates a /system/balancer.id file (30B) to track the balancer
Looks like the file contains hostname, whose size is not fixed, I see you 
increased block size and capacity to minimize the impact of the file, but it 
seems the risk is still there.

testBalancerWithRackLocality tests balancer do not perform cross rack block 
movements in test scenario, here are the related balancer logs:

{code}
014-04-15 18:29:48,649 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 over-utilized: []
2014-04-15 18:29:48,650 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
2 above-average: [Source[127.0.0.1:54333, utilization=30.0], 
Source[127.0.0.1:46174, utilization=30.0]]
2014-04-15 18:29:48,650 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 below-average: []
2014-04-15 18:29:48,650 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=0.0]]

2014-04-15 18:29:51,722 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 over-utilized: []
2014-04-15 18:29:51,722 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
2 above-average: [Source[127.0.0.1:54333, utilization=30.168], 
Source[127.0.0.1:46174, utilization=30.332]]
2014-04-15 18:29:51,722 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 below-average: []
2014-04-15 18:29:51,722 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
1 underutilized: [BalancerDatanode[127.0.0.1:48293, 
utilization=1.8333]]

2014-04-15 18:29:54,820 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 over-utilized: []
2014-04-15 18:29:54,820 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
2 above-average: [Source[127.0.0.1:54333, utilization=28.5], 
Source[127.0.0.1:46174, utilization=30.332]]
2014-04-15 18:29:54,820 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 below-average: []
2014-04-15 18:29:54,820 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=5.0]]

2014-04-15 18:29:57,898 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 over-utilized: []
2014-04-15 18:29:57,898 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
2 above-average: [Source[127.0.0.1:46174, utilization=30.332], 
Source[127.0.0.1:54333, utilization=25.332]]
2014-04-15 18:29:57,899 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 below-average: []
2014-04-15 18:29:57,899 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
1 underutilized: [BalancerDatanode[127.0.0.1:48293, 
utilization=7.667]]

2014-04-15 18:30:00,933 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 over-utilized: []
2014-04-15 18:30:00,933 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
2 above-average: [Source[127.0.0.1:54333, utilization=22.668], 
Source[127.0.0.1:46174, utilization=30.332]]
2014-04-15 18:30:00,933 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 below-average: []
2014-04-15 18:30:00,933 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
1 underutilized: [BalancerDatanode[127.0.0.1:48293, utilization=10.5]]

2014-04-15 18:30:03,989 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 over-utilized: []
2014-04-15 18:30:03,989 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
1 above-average: [Source[127.0.0.1:46174, utilization=30.332]]
2014-04-15 18:30:03,989 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
2 below-average: [BalancerDatanode[127.0.0.1:54333, 
utilization=19.832], BalancerDatanode[127.0.0.1:48293, 
utilization=12.0]]
2014-04-15 18:30:03,989 INFO  balancer.Balancer (Balancer.java:logNodes(960)) - 
0 underutilized: []
{code}

I guess the test intended to let /rack0/NODEGROUP0/dn above-average(=30%) but 
not over-utilized(30%, consider avg utilization=20%), so blocks on rack0 never 
move to rack1, but another balancer.id file may break the assumption. So there 
are some problem inherently in the test, not just race condition or timeout 
stuff. We may need to change the test(e.g. file size, utilize rate, validate 
method) to prevent those corner cases.


 TestBalancerWithNodeGroup.testBalancerWithRackLocality fails
 

 Key: HDFS-6250
 URL: https://issues.apache.org/jira/browse/HDFS-6250
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Kihwal Lee
Assignee: Chen He
 Attachments:

[jira] [Commented] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge

2014-05-05 Thread Binglin Chang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990208#comment-13990208
]

Binglin Chang commented on HDFS-6342:
-

Test rack capacities are equal doesn't mean there is no block movement cross
rack, I don't think simply add a new datanode works, right? Maybe we can make
more changes and in the mean time reduce the timeout if possible, 80 seconds
for a test is a bit long.

TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if
balancer.id file is huge
---

Key: HDFS-6342
URL: https://issues.apache.org/jira/browse/HDFS-6342
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Chen He
Assignee: Chen He
Attachments: HDFS-6342.patch

The testBalancerWithRackLocality mehtod is to test balancer moving data
blocks with rack locality consideration.
It crates two nodes cluster. One node belongs to rack0nodeGroup0, theother
node blongs to rack1nodeGroup1. In this 2 datanodes minicluster, block size
is 10B and total cluster capacity is 6000B ( 3000B on each datanodes). It
create 180 data blocks with replication factor 2. Then, a node datanode is
created (in rack1nodeGroup2) and balancer starts to balancing the cluster.
It expects there is only data blocks moving within rack1. After balancer is
done, it assumes the data size on both racks is the same. It will break
if balancer.id file is huge and there is inter-rack data block moving.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6342) TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if balancer.id file is huge

2014-05-05 Thread Binglin Chang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990211#comment-13990211
]

Binglin Chang commented on HDFS-6342:
-

As for the fix, I see the need to write a balancer id file, but fill it with
hostname doesn't seem to be necessary(cause it is never used anywhere), so if
we can modify balancer, write the balancer file but don't write any content, it
should not have side effects to balancer and test check code, and we may skip
timeout(need to confirm)

TestBalancerWithNodeGroup.testBalancerWithRackLocality may fail if
balancer.id file is huge
---

Key: HDFS-6342
URL: https://issues.apache.org/jira/browse/HDFS-6342
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Chen He
Assignee: Chen He
Attachments: HDFS-6342.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6308) TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky

2014-04-30 Thread Binglin Chang (JIRA)

Binglin Chang created HDFS-6308:
---

 Summary: 
TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky
 Key: HDFS-6308
 URL: https://issues.apache.org/jira/browse/HDFS-6308
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang


Found this on pre-commit build of HDFS-6261
{code}
java.lang.AssertionError: Expected one valid and one invalid volume
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6261) Add document for enabling node group layer in HDFS

2014-04-30 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985214#comment-13985214
 ] 

Binglin Chang commented on HDFS-6261:
-

Sorry, looked more carefully, the failure is different, fired HDFS-6308 for 
this.

 Add document for enabling node group layer in HDFS
 --

 Key: HDFS-6261
 URL: https://issues.apache.org/jira/browse/HDFS-6261
 Project: Hadoop HDFS
  Issue Type: Task
  Components: documentation
Reporter: Wenwu Peng
Assignee: Binglin Chang
  Labels: documentation
 Attachments: 2-layer-topology.png, 3-layer-topology.png, 
 3layer-topology.png, 4layer-topology.png, HDFS-6261.v1.patch, 
 HDFS-6261.v1.patch, HDFS-6261.v2.patch


 Most of patches from Umbrella JIRA HADOOP-8468  have committed, However there 
 is no site to introduce NodeGroup-aware(HADOOP Virtualization Extensisons) 
 and how to do configuration. so we need to doc it.
 1.  Doc NodeGroup-aware relate in http://hadoop.apache.org/docs/current 
 2.  Doc NodeGroup-aware properties in core-default.xml.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6308) TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky

2014-04-30 Thread Binglin Chang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985319#comment-13985319
 ] 

Binglin Chang commented on HDFS-6308:
-

Related error log:

{code}
2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine 
(ProtobufRpcEngine.java:invoke(197)) - 1418: Call - /127.0.0.1:58789: 
getHdfsBlockLocations {tokens { identifier:  password:  kind:  service: 
 } tokens { identifier:  password:  kind:  service:  } blockPoolId: 
BP-1664789652-67.195.138.24-1398662297553 blockIds: 1073741825 blockIds: 
1073741826}
2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine 
(ProtobufRpcEngine.java:invoke(197)) - 1419: Call - /127.0.0.1:45933: 
getHdfsBlockLocations {tokens { identifier:  password:  kind:  service: 
 } tokens { identifier:  password:  kind:  service:  } blockPoolId: 
BP-1664789652-67.195.138.24-1398662297553 blockIds: 1073741825 blockIds: 
1073741826}
2014-04-28 05:18:19,701 TRACE ipc.ProtobufRpcEngine 
(ProtobufRpcEngine.java:invoke(211)) - 1418: Exception - 
localhost/127.0.0.1:58789: getHdfsBlockLocations {java.net.ConnectException: 
Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:58789 failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused}
2014-04-28 05:18:19,701 INFO  ipc.Server (Server.java:doRead(762)) - Socket 
Reader #1 for port 45933: readAndProcess from client 127.0.0.1 threw exception 
[java.io.IOException: Connection reset by peer]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:2644)
at org.apache.hadoop.ipc.Server.access$2800(Server.java:133)
at 
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1517)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:753)
at 
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:627)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:598)
2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine 
(ProtobufRpcEngine.java:invoke(211)) - 1419: Exception - /127.0.0.1:45933: 
getHdfsBlockLocations {java.net.SocketTimeoutException: Call From 
asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket 
timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while 
waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 
remote=/127.0.0.1:45933]; For more details see:  
http://wiki.apache.org/hadoop/SocketTimeout}
2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine 
(ProtobufRpcEngine.java:invoke(211)) - 1415: Exception - 
localhost/127.0.0.1:45933: getHdfsBlockLocations 
{java.net.SocketTimeoutException: Call From 
asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket 
timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while 
waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102 
remote=/127.0.0.1:45933]; For more details see:  
{code}

socket read/write timeout is set to 1500ms, timeout error is global(per 
connection), so when timeout occurs, all calls in this connection are marked 
timeout, but the expected behavior should be: first call timeout, second call 
normal.

There is a simple fix, just invoke second call after the connection is closed 
for sure.

We can consider improving ipc.Client to prevent this kind of corner case later.




 TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky
 

 Key: HDFS-6308
 URL: https://issues.apache.org/jira/browse/HDFS-6308
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang

 Found this on pre-commit build of HDFS-6261
 {code}
 java.lang.AssertionError: Expected one valid and one invalid volume
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6308) TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky

2014-04-30 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6308:


Assignee: Binglin Chang
  Status: Patch Available  (was: Open)

 TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky
 

 Key: HDFS-6308
 URL: https://issues.apache.org/jira/browse/HDFS-6308
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang

 Found this on pre-commit build of HDFS-6261
 {code}
 java.lang.AssertionError: Expected one valid and one invalid volume
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6308) TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky

2014-04-30 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-6308:


Attachment: HDFS-6308.v1.patch

 TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky
 

 Key: HDFS-6308
 URL: https://issues.apache.org/jira/browse/HDFS-6308
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HDFS-6308.v1.patch


 Found this on pre-commit build of HDFS-6261
 {code}
 java.lang.AssertionError: Expected one valid and one invalid volume
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at 
 org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

1 2 3 4 >

1 - 100 of 310 matches

Mail list logo