[jira] [Updated] (HDFS-9358) TestNodeCount#testNodeCount timed out
[ https://issues.apache.org/jira/browse/HDFS-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9358: Resolution: Fixed Fix Version/s: 2.8.0 Target Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thanks [~iwasakims] for contribution, and [~jojochuang] for good analysis. > TestNodeCount#testNodeCount timed out > - > > Key: HDFS-9358 > URL: https://issues.apache.org/jira/browse/HDFS-9358 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Wei-Chiu Chuang >Assignee: Masatake Iwasaki > Fix For: 2.8.0 > > Attachments: HDFS-9358.001.patch, HDFS-9358.002.patch > > > I have seen this test failure occurred a few times in trunk: > Error Message > Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 > after 2 msec. Last counts: live = 2, excess = 0, corrupt = 0 > Stacktrace > java.util.concurrent.TimeoutException: Timeout: excess replica count not > equal to 2 for block blk_1073741825_1001 after 2 msec. Last counts: live > = 2, excess = 0, corrupt = 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9358) TestNodeCount#testNodeCount timed out
[ https://issues.apache.org/jira/browse/HDFS-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-9358: Component/s: test > TestNodeCount#testNodeCount timed out > - > > Key: HDFS-9358 > URL: https://issues.apache.org/jira/browse/HDFS-9358 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Wei-Chiu Chuang >Assignee: Masatake Iwasaki > Attachments: HDFS-9358.001.patch, HDFS-9358.002.patch > > > I have seen this test failure occurred a few times in trunk: > Error Message > Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 > after 2 msec. Last counts: live = 2, excess = 0, corrupt = 0 > Stacktrace > java.util.concurrent.TimeoutException: Timeout: excess replica count not > equal to 2 for block blk_1073741825_1001 after 2 msec. Last counts: live > = 2, excess = 0, corrupt = 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9358) TestNodeCount#testNodeCount timed out
[ https://issues.apache.org/jira/browse/HDFS-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-9358: --- Attachment: HDFS-9358.002.patch Thanks for the comment, [~walter.k.su]. bq. 1. We can set heartBeat interval to 1s to shorten running time. Shortening heartbeat interval did not make significant difference but shortening replication interval did. I set shorter intervals for the both, anyway. bq. So I think we can disable block invalidation by setting large delay to make it non-transient, then the test is more stable. Sure. I think that is better because we can get rid of busy loop checking test condition to make test easier to debug. I attached 002 based on your suggestions. It did not fail in 100 runs. > TestNodeCount#testNodeCount timed out > - > > Key: HDFS-9358 > URL: https://issues.apache.org/jira/browse/HDFS-9358 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Masatake Iwasaki > Attachments: HDFS-9358.001.patch, HDFS-9358.002.patch > > > I have seen this test failure occurred a few times in trunk: > Error Message > Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 > after 2 msec. Last counts: live = 2, excess = 0, corrupt = 0 > Stacktrace > java.util.concurrent.TimeoutException: Timeout: excess replica count not > equal to 2 for block blk_1073741825_1001 after 2 msec. Last counts: live > = 2, excess = 0, corrupt = 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9358) TestNodeCount#testNodeCount timed out
[ https://issues.apache.org/jira/browse/HDFS-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-9358: --- Status: Patch Available (was: Open) > TestNodeCount#testNodeCount timed out > - > > Key: HDFS-9358 > URL: https://issues.apache.org/jira/browse/HDFS-9358 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Masatake Iwasaki > Attachments: HDFS-9358.001.patch > > > I have seen this test failure occurred a few times in trunk: > Error Message > Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 > after 2 msec. Last counts: live = 2, excess = 0, corrupt = 0 > Stacktrace > java.util.concurrent.TimeoutException: Timeout: excess replica count not > equal to 2 for block blk_1073741825_1001 after 2 msec. Last counts: live > = 2, excess = 0, corrupt = 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9358) TestNodeCount#testNodeCount timed out
[ https://issues.apache.org/jira/browse/HDFS-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-9358: --- Attachment: HDFS-9358.001.patch Thanks for reporting this, [~jojochuang]. The testNodeCount expects number of excess replica to be increased to 2 by excessReplicateMap. (live, excess) could be changed in the case as {noformat} (live, excess): (3, 1) -> (2, 2) {noformat} If invalidation of existing excess replica is executed before excessReplicateMap is updated, number of excess replica never be 2. {noformat} (live, excess): (3, 1) -> (3, 0) -> (2, 1) {noformat} Attached 001 fix the test to wait for invalidation of the 1st excess replica then check the 2nd excess replica is detected. > TestNodeCount#testNodeCount timed out > - > > Key: HDFS-9358 > URL: https://issues.apache.org/jira/browse/HDFS-9358 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Wei-Chiu Chuang >Assignee: Masatake Iwasaki > Attachments: HDFS-9358.001.patch > > > I have seen this test failure occurred a few times in trunk: > Error Message > Timeout: excess replica count not equal to 2 for block blk_1073741825_1001 > after 2 msec. Last counts: live = 2, excess = 0, corrupt = 0 > Stacktrace > java.util.concurrent.TimeoutException: Timeout: excess replica count not > equal to 2 for block blk_1073741825_1001 after 2 msec. Last counts: live > = 2, excess = 0, corrupt = 0 > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:152) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.checkTimeout(TestNodeCount.java:146) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.__CLR4_0_39bdgm666uf(TestNodeCount.java:130) > at > org.apache.hadoop.hdfs.server.blockmanagement.TestNodeCount.testNodeCount(TestNodeCount.java:54) -- This message was sent by Atlassian JIRA (v6.3.4#6332)