[jira] [Updated] (HDFS-9358) TestNodeCount#testNodeCount timed out

2015-11-16 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9358:

  Resolution: Fixed
   Fix Version/s: 2.8.0
Target Version/s: 2.8.0
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2.
Thanks [~iwasakims] for contribution, and [~jojochuang] for good analysis.


> TestNodeCount#testNodeCount timed out
> -
>
> Key: HDFS-9358
> URL: https://issues.apache.org/jira/browse/HDFS-9358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Wei-Chiu Chuang
>Assignee: Masatake Iwasaki
> Fix For: 2.8.0
>
> Attachments: HDFS-9358.001.patch, HDFS-9358.002.patch
>
>
> I have seen this test failure occurred a few times in trunk:
> Error Message
> Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 
> after 2 msec.  Last counts: live = 2, excess = 0, corrupt = 0
> Stacktrace
> java.util.concurrent.TimeoutException: Timeout: excess replica count not 
> equal to 2 for block blk_1073741825_1001 after 2 msec.  Last counts: live 
> = 2, excess = 0, corrupt = 0
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9358) TestNodeCount#testNodeCount timed out

2015-11-16 Thread Walter Su (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Walter Su updated HDFS-9358:

Component/s: test

> TestNodeCount#testNodeCount timed out
> -
>
> Key: HDFS-9358
> URL: https://issues.apache.org/jira/browse/HDFS-9358
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Wei-Chiu Chuang
>Assignee: Masatake Iwasaki
> Attachments: HDFS-9358.001.patch, HDFS-9358.002.patch
>
>
> I have seen this test failure occurred a few times in trunk:
> Error Message
> Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 
> after 2 msec.  Last counts: live = 2, excess = 0, corrupt = 0
> Stacktrace
> java.util.concurrent.TimeoutException: Timeout: excess replica count not 
> equal to 2 for block blk_1073741825_1001 after 2 msec.  Last counts: live 
> = 2, excess = 0, corrupt = 0
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9358) TestNodeCount#testNodeCount timed out

2015-11-16 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-9358:
---
Attachment: HDFS-9358.002.patch

Thanks for the comment, [~walter.k.su].

bq. 1. We can set heartBeat interval to 1s to shorten running time.

Shortening heartbeat interval did not make significant difference but 
shortening replication interval did. I set shorter intervals for the both, 
anyway.

bq. So I think we can disable block invalidation by setting large delay to make 
it non-transient, then the test is more stable.

Sure. I think that is better because we can get rid of busy loop checking test 
condition to make test easier to debug.

I attached 002 based on your suggestions. It did not fail in 100 runs.


> TestNodeCount#testNodeCount timed out
> -
>
> Key: HDFS-9358
> URL: https://issues.apache.org/jira/browse/HDFS-9358
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Masatake Iwasaki
> Attachments: HDFS-9358.001.patch, HDFS-9358.002.patch
>
>
> I have seen this test failure occurred a few times in trunk:
> Error Message
> Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 
> after 2 msec.  Last counts: live = 2, excess = 0, corrupt = 0
> Stacktrace
> java.util.concurrent.TimeoutException: Timeout: excess replica count not 
> equal to 2 for block blk_1073741825_1001 after 2 msec.  Last counts: live 
> = 2, excess = 0, corrupt = 0
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9358) TestNodeCount#testNodeCount timed out

2015-11-12 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-9358:
---
Status: Patch Available  (was: Open)

> TestNodeCount#testNodeCount timed out
> -
>
> Key: HDFS-9358
> URL: https://issues.apache.org/jira/browse/HDFS-9358
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Masatake Iwasaki
> Attachments: HDFS-9358.001.patch
>
>
> I have seen this test failure occurred a few times in trunk:
> Error Message
> Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 
> after 2 msec.  Last counts: live = 2, excess = 0, corrupt = 0
> Stacktrace
> java.util.concurrent.TimeoutException: Timeout: excess replica count not 
> equal to 2 for block blk_1073741825_1001 after 2 msec.  Last counts: live 
> = 2, excess = 0, corrupt = 0
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9358) TestNodeCount#testNodeCount timed out

2015-11-12 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-9358:
---
Attachment: HDFS-9358.001.patch

Thanks for reporting this, [~jojochuang].

The testNodeCount expects number of excess replica to be increased to 2 by 
excessReplicateMap. (live, excess) could be changed in the case as
{noformat}
  (live, excess): (3, 1) -> (2, 2)
{noformat}

If invalidation of existing excess replica is executed before 
excessReplicateMap is updated, number of excess replica never be 2.
{noformat}
  (live, excess): (3, 1) -> (3, 0) -> (2, 1)
{noformat}

Attached 001 fix the test to wait for invalidation of the 1st excess replica 
then check the 2nd excess replica is detected.


> TestNodeCount#testNodeCount timed out
> -
>
> Key: HDFS-9358
> URL: https://issues.apache.org/jira/browse/HDFS-9358
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Wei-Chiu Chuang
>Assignee: Masatake Iwasaki
> Attachments: HDFS-9358.001.patch
>
>
> I have seen this test failure occurred a few times in trunk:
> Error Message
> Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 
> after 2 msec.  Last counts: live = 2, excess = 0, corrupt = 0
> Stacktrace
> java.util.concurrent.TimeoutException: Timeout: excess replica count not 
> equal to 2 for block blk_1073741825_1001 after 2 msec.  Last counts: live 
> = 2, excess = 0, corrupt = 0
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)