[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309005#comment-15309005 ] Hudson commented on HDFS-9466: -- SUCCESS: Integrated in Hadoop-trunk-Commit #9891 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9891/]) HDFS-9466. TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure (cmccabe: rev c7921c9bddb79c9db5059b6c3f7a3a586a3cd95b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ShortCircuitRegistry.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15308980#comment-15308980 ] Colin Patrick McCabe commented on HDFS-9466: Thanks for the explanation. It sounds like the race condition is that the ShortCircuitRegistry on the DN needs to be informed about the client's decision that short-circuit is not working for the block, and this RPC takes time to arrive. That background process races with completing the TCP read successfully and checking the number of slots in the unit test. {code} public static interface Visitor { -void accept(HashMapsegments, +boolean accept(HashMap segments, HashMultimap slots); } {code} I don't think it makes sense to change the return type of the visitor. While you might find a boolean convenient, some other potential users of the interface would have no use for it. Instead, just have your closure modify a {{final MutableBoolean}} declared nearby. {code} +}, 100, 1); {code} No reason to make this shorter than the test limit, surely? +1 once that's addressed. Thanks, [~jojochuang]. Sorry for the delay in reviews. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292366#comment-15292366 ] Wei-Chiu Chuang commented on HDFS-9466: --- Hello [~cmccabe], would you mind to review it again? Thank you very much! > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch, > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache-output.txt > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097201#comment-15097201 ] Wei-Chiu Chuang commented on HDFS-9466: --- [~cmccabe] Xiao is right about what I thought. It does appear there is a race. From your perspective, do you think that's by design, or some unintended bugs in the code? Thanks for the reviews! > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097350#comment-15097350 ] Colin Patrick McCabe commented on HDFS-9466: Hmm. Can you be clearer on what the race condition is here? > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092457#comment-15092457 ] Colin Patrick McCabe commented on HDFS-9466: Thanks for looking at this, [~jojochuang]. The idea behind this test is that the slots should never be removed by a timeout. That's why this line is there: {code} conf.setLong( HdfsClientConfigKeys.Read.ShortCircuit.STREAMS_CACHE_EXPIRY_MS_KEY, 10L); {code} Unless I'm missing something, adding a {{waitFor}} will not have any effect here, since the streams expiry timeout is longer than the test timeout. I'm not completely sure why we're getting 2 slots here instead of 1... we might need to see if we can get log files from a test run that failed. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15092579#comment-15092579 ] Xiao Chen commented on HDFS-9466: - Hi [~cmccabe], Thanks for looking into this patch. IIUC, the patch is trying to fix the race where the check happens before the slot is removed before the failed call (The following is injected to be failing). So the expiry should be fine, and the {{waitFor}} is just for testing purpose on the check. Wei-Chiu please correct me if I'm wrong. {code} // The second read should fail, and we should only have 1 segment and 1 slot // left. fs.getClient().getConf().brfFailureInjector = new TestCleanupFailureInjector(); try { DFSTestUtil.readFileBuffer(fs, TEST_PATH2); } catch (Throwable t) { GenericTestUtils.assertExceptionContains("TCP reads were disabled for " + "testing, but we failed to do a non-TCP read.", t); } {code} > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15088308#comment-15088308 ] Xiao Chen commented on HDFS-9466: - Thanks Wei-Chiu. +1 (non-binding). > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch, HDFS-9466.002.patch > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069620#comment-15069620 ] Hadoop QA commented on HDFS-9466: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 1s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 55s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 45s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 40s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 25s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 140m 29s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion | | | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | | | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots | | | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark | | | hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad | | JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestSnapshot | | | hadoop.hdfs.TestDFSShell | | | hadoop.hdfs.server.namenode.TestNNThroughputBenchmark | | | hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad | \\
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069866#comment-15069866 ] Wei-Chiu Chuang commented on HDFS-9466: --- Test failures appear unrelated. I've run the test locally and had no failures in 1000 runs. Previously the test fails frequently, about 1 in 5 runs. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070010#comment-15070010 ] Wei-Chiu Chuang commented on HDFS-9466: --- Thank you very much for the review and comments! (1) AFAIK, the three tests use different fault injectors. testDataXceiverCleansUpSlotsOnFailure injects a failure at BlockReaderFactory.requestFileDescriptors(); while testDataXceiverHandlesRequestShortCircuitShmFailure injects a failure at DataXceiver.sendShmSuccessResponse, and testPreReceiptVerificationDfsClientCanDoScr injects a failure at BlockReaderFactory.requestFileDescriptors(). (2) testPreReceiptVerificationDfsClientCanDoScr should also call checkNumberOfSegmentsAndSlots to maintain consistency. (3) Definitely. I was just not sure what would be a reasonable number. Setting 1 second or 10 seconds should be more than sufficient though. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069915#comment-15069915 ] Xiao Chen commented on HDFS-9466: - Thanks [~jojochuang] for working on this. The analysis sounds reasonable to me. I have 1 question and 2 minor comments. Question: Do we know why this failed only for {{testDataXceiverCleansUpSlotsOnFailure}}, but not for similar checks in {{testDataXceiverHandlesRequestShortCircuitShmFailure}} or {{testPreReceiptVerificationDfsClientCanDoScr}}? I personally only saw {{testDataXceiverCleansUpSlotsOnFailure}} fail intermittently, but never met the others. I could've missed them, but just want to understand better. Comments: * Can we update {{testPreReceiptVerificationDfsClientCanDoScr}} to use the extracted method {{checkNumberOfSegmentsAndSlots}}? * Can we still set an estimated max for {{waitFor}}? The test case has its own timeouts, but IMHO wait for {{Integer.MAX_VALUE}} is not best practice. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-9466.001.patch > > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070354#comment-15070354 ] Hadoop QA commented on HDFS-9466: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 52s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s {color} | {color:green} trunk passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 20s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 63m 18s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 25s {color} | {color:red} Patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 162m 43s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.hdfs.server.datanode.TestBlockScanner | | | hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad | | | hadoop.hdfs.server.datanode.TestBlockReplacement | | | hadoop.hdfs.web.TestWebHDFSXAttr | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure190 | | JDK v1.7.0_91 Failed junit tests | hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL |
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069486#comment-15069486 ] Wei-Chiu Chuang commented on HDFS-9466: --- Assigning this JIRA to me. I found it is due to race condition. The test checks the number of slots before the slots are removed. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026035#comment-15026035 ] Wei-Chiu Chuang commented on HDFS-9466: --- This is a regression of HDFS-7915. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9466) TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky
[ https://issues.apache.org/jira/browse/HDFS-9466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026038#comment-15026038 ] Wei-Chiu Chuang commented on HDFS-9466: --- >From what I can see, the assertion failed when the expected number of slots is >2, not 1. > TestShortCircuitCache#testDataXceiverCleansUpSlotsOnFailure is flaky > > > Key: HDFS-9466 > URL: https://issues.apache.org/jira/browse/HDFS-9466 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs, hdfs-client >Reporter: Wei-Chiu Chuang > > This test is flaky and fails quite frequently in trunk. > Error Message > expected:<1> but was:<2> > Stacktrace > {noformat} > java.lang.AssertionError: expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:555) > at org.junit.Assert.assertEquals(Assert.java:542) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache$17.accept(TestShortCircuitCache.java:636) > at > org.apache.hadoop.hdfs.server.datanode.ShortCircuitRegistry.visit(ShortCircuitRegistry.java:395) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.checkNumberOfSegmentsAndSlots(TestShortCircuitCache.java:631) > at > org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache.testDataXceiverCleansUpSlotsOnFailure(TestShortCircuitCache.java:684) > {noformat} > Thanks to [~xiaochen] for identifying the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)