[jira] [Comment Edited] (CASSANDRA-12784) ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-12784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574654#comment-15574654 ] Stefania edited comment on CASSANDRA-12784 at 10/14/16 9:02 AM: Thanks for the review. The multiplexed run returned a number of [failures|https://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-testall-multiplex/26/] in {{RandomReplicationAwareTokenAllocatorTest.testExistingCluster}}, is this test also expected to be flaky or is this a problem? Splitting the test caused a failure in testall, I've fixed it and relaunched. bq. The stack printout for single flakes could be useful to track the history of a failure; I would prefer not to lose it, but I wouldn't stop the commit if that is something you think is worth sacrificing. The stack printout is still there, the logger is smart enough to work it out, I've tested it locally, see sample output below. bq. I would rename flakyTestNewCluster in the base class to just testNewCluster since the individual runner is the one that declares it flaky and handles that. Done. bq. Is there a reason to post the the commits view instead of the the branch one? There's no direct way to get from the commits view to compare which is the one most useful for reviews. Not really: ||3.X||trunk|| |[patch|https://github.com/stef1927/cassandra/tree/12784-3.X]|[patch|https://github.com/stef1927/cassandra/tree/12784]| |[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12784-3.X-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12784-testall/]| -- Sample stack printout obtained by inserting a fake {{Assert.fail("Test message")}}: {code} [junit] INFO [main] 2016-10-14 16:10:44,810 ?:? - Test failed. It tends to fail sometimes due to the random selection of the tokens in the first few nodes. [junit] java.lang.AssertionError: Test message [junit] at org.junit.Assert.fail(Assert.java:88) ~[junit-4.12.jar:4.12] [junit] at org.apache.cassandra.dht.tokenallocator.AbstractReplicationAwareTokenAllocatorTest.testNewCluster(AbstractReplicationAwareTokenAllocatorTest.java:592) ~[classes/:na] [junit] at org.apache.cassandra.dht.tokenallocator.AbstractReplicationAwareTokenAllocatorTest.testNewCluster(AbstractReplicationAwareTokenAllocatorTest.java:563) ~[classes/:na] [junit] at org.apache.cassandra.dht.tokenallocator.RandomReplicationAwareTokenAllocatorTest.flakyTestNewCluster(RandomReplicationAwareTokenAllocatorTest.java:51) [classes/:na] [junit] at org.apache.cassandra.Util.runCatchingAssertionError(Util.java:581) ~[classes/:na] [junit] at org.apache.cassandra.Util.flakyTest(Util.java:611) ~[classes/:na] [junit] at org.apache.cassandra.dht.tokenallocator.RandomReplicationAwareTokenAllocatorTest.testNewClusterr(RandomReplicationAwareTokenAllocatorTest.java:44) [classes/:na] [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_101] [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_101] [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_101] [junit] at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_101] [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) [junit-4.12.jar:4.12] [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) [junit-4.12.jar:4.12] [junit] at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) [junit-4.12.jar:4.12] [junit] at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) [junit-4.12.jar:4.12] [junit] at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) [junit-4.12.jar:4.12] [junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) [junit-4.12.jar:4.12] [junit] at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) [junit-4.12.jar:4.12] [junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12] [junit] at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12] [junit] at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12] [junit] at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12] [junit] at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12] [junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12] [junit] at
[jira] [Comment Edited] (CASSANDRA-12784) ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-12784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571415#comment-15571415 ] Stefania edited comment on CASSANDRA-12784 at 10/13/16 9:37 AM: bq. Since the same test is being done for a large vnode count for the Murmur partitioner, I have absolutely nothing against reducing the scope of the random version to no higher than 16 vnodes. Thanks, I'll change the test accordingly. bq. The flaky utility failure is caused by not catching the right error type for some junit versions. The fix is to include || AssertionFailedError in the catch in runCatchingAssertionError. That's what I thought as well initially, but actually AssertionFailedError is a sub-class of AssertionError so that is not the reason. I verified in Intellij that if we call Assert.fail() in the same method, the exception is caught. I must admit I don't understand this yet, unless it really ran 5 times and it failed every time but the suppressed exceptions were not displayed. The trouble is that it is hard to reproduce. We can add more debug information and leave it until we can reproduce it. was (Author: stefania): bq. Since the same test is being done for a large vnode count for the Murmur partitioner, I have absolutely nothing against reducing the scope of the random version to no higher than 16 vnodes. Thanks, I'll change the test accordingly. bq. The flaky utility failure is caused by not catching the right error type for some junit versions. The fix is to include || AssertionFailedError in the catch in runCatchingAssertionError. That's what I thought as well initially, but actually AssertionFailedError is a sub-class of AssertionError so that is not the reason. I verified in Intellij that if we call Assert.fail() in the same method, the exception is catched. I must admit I don't understand this yet, unless it really ran 5 times and it failed every time but the suppressed exceptions were not displayed. The trouble is that it is hard to reproduce. We can add more debug information and leave it until we can reproduce it. > ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and > trunk > > > Key: CASSANDRA-12784 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12784 > Project: Cassandra > Issue Type: Bug >Reporter: Stefania >Assignee: Stefania > Fix For: 3.x > > Attachments: ReplicationAwareTokenAllocatorTest.jfr.gz > > > Example failure: > http://cassci.datastax.com/view/cassandra-3.X/job/cassandra-3.X_testall/lastCompletedBuild/testReport/org.apache.cassandra.dht.tokenallocator/ReplicationAwareTokenAllocatorTest/testNewClusterWithMurmur3Partitioner/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-12784) ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and trunk
[ https://issues.apache.org/jira/browse/CASSANDRA-12784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571415#comment-15571415 ] Stefania edited comment on CASSANDRA-12784 at 10/13/16 9:31 AM: bq. Since the same test is being done for a large vnode count for the Murmur partitioner, I have absolutely nothing against reducing the scope of the random version to no higher than 16 vnodes. Thanks, I'll change the test accordingly. bq. The flaky utility failure is caused by not catching the right error type for some junit versions. The fix is to include || AssertionFailedError in the catch in runCatchingAssertionError. That's what I thought as well initially, but actually AssertionFailedError is a sub-class of AssertionError so that is not the reason. I verified in Intellij that if we call Assert.fail() in the same method, the exception is catched. I must admit I don't understand this yet, unless it really ran 5 times and it failed every time but the suppressed exceptions were not displayed. The trouble is that it is hard to reproduce. We can add more debug information and leave it until we can reproduce it. was (Author: stefania): bq. Since the same test is being done for a large vnode count for the Murmur partitioner, I have absolutely nothing against reducing the scope of the random version to no higher than 16 vnodes. Thanks, I'll change the test accordingly. bq. The flaky utility failure is caused by not catching the right error type for some junit versions. The fix is to include || AssertionFailedError in the catch in runCatchingAssertionError. That's what I thought as well initially, but actually AssertionFailedError is a sub-class of AssertionError so that is not the reason. I verified in Intellij that if we call Assert.fail() in the same method, the exception is catched. I must admit I don't understand this yet, unless it really ran 5 times and it failed every time but the suppressed exceptions were not displayed. The trouble is that it is hard to reproduce. > ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and > trunk > > > Key: CASSANDRA-12784 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12784 > Project: Cassandra > Issue Type: Bug >Reporter: Stefania >Assignee: Stefania > Fix For: 3.x > > Attachments: ReplicationAwareTokenAllocatorTest.jfr.gz > > > Example failure: > http://cassci.datastax.com/view/cassandra-3.X/job/cassandra-3.X_testall/lastCompletedBuild/testReport/org.apache.cassandra.dht.tokenallocator/ReplicationAwareTokenAllocatorTest/testNewClusterWithMurmur3Partitioner/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)