[jira] [Comment Edited] (CASSANDRA-12784) ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and trunk

2016-10-14 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574654#comment-15574654
 ] 

Stefania edited comment on CASSANDRA-12784 at 10/14/16 9:02 AM:


Thanks for the review.

The multiplexed run returned a number of 
[failures|https://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-testall-multiplex/26/]
 in {{RandomReplicationAwareTokenAllocatorTest.testExistingCluster}}, is this 
test also expected to be flaky or is this a problem?

Splitting the test caused a failure in testall, I've fixed it and relaunched.

bq. The stack printout for single flakes could be useful to track the history 
of a failure; I would prefer not to lose it, but I wouldn't stop the commit if 
that is something you think is worth sacrificing.

The stack printout is still there, the logger is smart enough to work it out, 
I've tested it locally, see sample output below.

bq. I would rename flakyTestNewCluster in the base class to just testNewCluster 
since the individual runner is the one that declares it flaky and handles that.

Done.

bq. Is there a reason to post the the commits view instead of the the branch 
one? There's no direct way to get from the commits view to compare which is the 
one most useful for reviews.

Not really:

||3.X||trunk||
|[patch|https://github.com/stef1927/cassandra/tree/12784-3.X]|[patch|https://github.com/stef1927/cassandra/tree/12784]|
|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12784-3.X-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/stef1927/job/stef1927-12784-testall/]|

--

Sample stack printout obtained by inserting a fake {{Assert.fail("Test 
message")}}:

{code}
  [junit] INFO  [main] 2016-10-14 16:10:44,810 ?:? - Test failed. It tends to 
fail sometimes due to the random selection of the tokens in the first few nodes.
[junit] java.lang.AssertionError: Test message
[junit] at org.junit.Assert.fail(Assert.java:88) ~[junit-4.12.jar:4.12]
[junit] at 
org.apache.cassandra.dht.tokenallocator.AbstractReplicationAwareTokenAllocatorTest.testNewCluster(AbstractReplicationAwareTokenAllocatorTest.java:592)
 ~[classes/:na]
[junit] at 
org.apache.cassandra.dht.tokenallocator.AbstractReplicationAwareTokenAllocatorTest.testNewCluster(AbstractReplicationAwareTokenAllocatorTest.java:563)
 ~[classes/:na]
[junit] at 
org.apache.cassandra.dht.tokenallocator.RandomReplicationAwareTokenAllocatorTest.flakyTestNewCluster(RandomReplicationAwareTokenAllocatorTest.java:51)
 [classes/:na]
[junit] at 
org.apache.cassandra.Util.runCatchingAssertionError(Util.java:581) 
~[classes/:na]
[junit] at org.apache.cassandra.Util.flakyTest(Util.java:611) 
~[classes/:na]
[junit] at 
org.apache.cassandra.dht.tokenallocator.RandomReplicationAwareTokenAllocatorTest.testNewClusterr(RandomReplicationAwareTokenAllocatorTest.java:44)
 [classes/:na]
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.8.0_101]
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[na:1.8.0_101]
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.8.0_101]
[junit] at java.lang.reflect.Method.invoke(Method.java:498) 
~[na:1.8.0_101]
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 [junit-4.12.jar:4.12]
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 [junit-4.12.jar:4.12]
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 [junit-4.12.jar:4.12]
[junit] at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 [junit-4.12.jar:4.12]
[junit] at 
org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) 
[junit-4.12.jar:4.12]
[junit] at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
 [junit-4.12.jar:4.12]
[junit] at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
 [junit-4.12.jar:4.12]
[junit] at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) 
[junit-4.12.jar:4.12]
[junit] at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) 
[junit-4.12.jar:4.12]
[junit] at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) 
[junit-4.12.jar:4.12]
[junit] at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) 
[junit-4.12.jar:4.12]
[junit] at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) 
[junit-4.12.jar:4.12]
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:363) 
[junit-4.12.jar:4.12]
[junit] at 

[jira] [Comment Edited] (CASSANDRA-12784) ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and trunk

2016-10-13 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571415#comment-15571415
 ] 

Stefania edited comment on CASSANDRA-12784 at 10/13/16 9:37 AM:


bq. Since the same test is being done for a large vnode count for the Murmur 
partitioner, I have absolutely nothing against reducing the scope of the random 
version to no higher than 16 vnodes.

Thanks, I'll change the test accordingly.

bq. The flaky utility failure is caused by not catching the right error type 
for some junit versions. The fix is to include || AssertionFailedError in the 
catch in runCatchingAssertionError.

That's what I thought as well initially, but actually AssertionFailedError is a 
sub-class of AssertionError so that is not the reason. I verified in Intellij 
that if we call Assert.fail() in the same method, the exception is caught. I 
must admit I don't understand this yet, unless it really ran 5 times and it 
failed every time but the suppressed exceptions were not displayed. The trouble 
is that it is hard to reproduce. We can add more debug information and leave it 
until we can reproduce it.


was (Author: stefania):
bq. Since the same test is being done for a large vnode count for the Murmur 
partitioner, I have absolutely nothing against reducing the scope of the random 
version to no higher than 16 vnodes.

Thanks, I'll change the test accordingly.

bq. The flaky utility failure is caused by not catching the right error type 
for some junit versions. The fix is to include || AssertionFailedError in the 
catch in runCatchingAssertionError.

That's what I thought as well initially, but actually AssertionFailedError is a 
sub-class of AssertionError so that is not the reason. I verified in Intellij 
that if we call Assert.fail() in the same method, the exception is catched. I 
must admit I don't understand this yet, unless it really ran 5 times and it 
failed every time but the suppressed exceptions were not displayed. The trouble 
is that it is hard to reproduce. We can add more debug information and leave it 
until we can reproduce it.

> ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and 
> trunk
> 
>
> Key: CASSANDRA-12784
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12784
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 3.x
>
> Attachments: ReplicationAwareTokenAllocatorTest.jfr.gz
>
>
> Example failure: 
> http://cassci.datastax.com/view/cassandra-3.X/job/cassandra-3.X_testall/lastCompletedBuild/testReport/org.apache.cassandra.dht.tokenallocator/ReplicationAwareTokenAllocatorTest/testNewClusterWithMurmur3Partitioner/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12784) ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and trunk

2016-10-13 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571415#comment-15571415
 ] 

Stefania edited comment on CASSANDRA-12784 at 10/13/16 9:31 AM:


bq. Since the same test is being done for a large vnode count for the Murmur 
partitioner, I have absolutely nothing against reducing the scope of the random 
version to no higher than 16 vnodes.

Thanks, I'll change the test accordingly.

bq. The flaky utility failure is caused by not catching the right error type 
for some junit versions. The fix is to include || AssertionFailedError in the 
catch in runCatchingAssertionError.

That's what I thought as well initially, but actually AssertionFailedError is a 
sub-class of AssertionError so that is not the reason. I verified in Intellij 
that if we call Assert.fail() in the same method, the exception is catched. I 
must admit I don't understand this yet, unless it really ran 5 times and it 
failed every time but the suppressed exceptions were not displayed. The trouble 
is that it is hard to reproduce. We can add more debug information and leave it 
until we can reproduce it.


was (Author: stefania):
bq. Since the same test is being done for a large vnode count for the Murmur 
partitioner, I have absolutely nothing against reducing the scope of the random 
version to no higher than 16 vnodes.

Thanks, I'll change the test accordingly.

bq. The flaky utility failure is caused by not catching the right error type 
for some junit versions. The fix is to include || AssertionFailedError in the 
catch in runCatchingAssertionError.

That's what I thought as well initially, but actually AssertionFailedError is a 
sub-class of AssertionError so that is not the reason. I verified in Intellij 
that if we call Assert.fail() in the same method, the exception is catched. I 
must admit I don't understand this yet, unless it really ran 5 times and it 
failed every time but the suppressed exceptions were not displayed. The trouble 
is that it is hard to reproduce.

> ReplicationAwareTokenAllocatorTest times out almost every time for 3.X and 
> trunk
> 
>
> Key: CASSANDRA-12784
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12784
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 3.x
>
> Attachments: ReplicationAwareTokenAllocatorTest.jfr.gz
>
>
> Example failure: 
> http://cassci.datastax.com/view/cassandra-3.X/job/cassandra-3.X_testall/lastCompletedBuild/testReport/org.apache.cassandra.dht.tokenallocator/ReplicationAwareTokenAllocatorTest/testNewClusterWithMurmur3Partitioner/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)