[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds

2021-04-29 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335714#comment-17335714
 ] 

David Capwell commented on CASSANDRA-16238:
---

+1

> Fix flaky jvm-dtests that fail with Unable to contact any seeds
> ---
>
> Key: CASSANDRA-16238
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16238
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-rc
>
> Attachments: 16238-archived-failures.txt
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231
> {code}
> test teardown failure
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), 
> ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds

2021-04-29 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335687#comment-17335687
 ] 

Brandon Williams commented on CASSANDRA-16238:
--

[Branch|https://github.com/driftx/cassandra/tree/CASSANDRA-16238] 
[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/716/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/716/pipeline]



> Fix flaky jvm-dtests that fail with Unable to contact any seeds
> ---
>
> Key: CASSANDRA-16238
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16238
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-rc
>
> Attachments: 16238-archived-failures.txt
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231
> {code}
> test teardown failure
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), 
> ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds

2021-04-28 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334869#comment-17334869
 ] 

Brandon Williams commented on CASSANDRA-16238:
--

{noformat}
[junit-timeout] WARN  [node4_GossipStage:1] node4 2021-04-28 16:02:24,532 
Gossiper.java:1002 - Race condition marking /127.0.0.2:7012 as a FatClient; 
ignoring
{noformat}

That will work too.

> Fix flaky jvm-dtests that fail with Unable to contact any seeds
> ---
>
> Key: CASSANDRA-16238
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16238
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-rc
>
> Attachments: 16238-archived-failures.txt
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231
> {code}
> test teardown failure
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), 
> ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds

2021-04-27 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334402#comment-17334402
 ] 

David Capwell commented on CASSANDRA-16238:
---

If this is a race reading un-committed data, then the patch below might work 
around it (double check locking, but with queues)

Patch:

{code}
$ git diff
diff --git a/src/java/org/apache/cassandra/gms/Gossiper.java 
b/src/java/org/apache/cassandra/gms/Gossiper.java
index 699f235bd3..e82b107bac 100644
--- a/src/java/org/apache/cassandra/gms/Gossiper.java
+++ b/src/java/org/apache/cassandra/gms/Gossiper.java
@@ -928,6 +928,11 @@ public class Gossiper implements 
IFailureDetectionEventListener, GossiperMBean
 {
 logger.info("FatClient {} has been silent for {}ms, 
removing from gossip", endpoint, fatClientTimeout);
 runInGossipStageBlocking(() -> {
+if (!isGossipOnlyMember(endpoint))
+{
+// updating gossip and token metadata are not 
atomic, but rely on the single threaded gossip stage
+// since status checks are done outside the gossip 
stage, need to confirm the state of the endpoint
+// to make sure that the previous read data was 
correct
+logger.info("Race condition marking {} as a 
FatClient; ignoring", endpoint);
+return;
+}
 removeEndpoint(endpoint); // will put it in 
justRemovedEndpoints to respect quarantine delay
 evictFromMembership(endpoint); // can get rid of the 
state immediately
 });
{code}

> Fix flaky jvm-dtests that fail with Unable to contact any seeds
> ---
>
> Key: CASSANDRA-16238
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16238
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-rc
>
> Attachments: 16238-archived-failures.txt
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231
> {code}
> test teardown failure
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), 
> ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds

2021-04-27 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333296#comment-17333296
 ] 

Brandon Williams commented on CASSANDRA-16238:
--

bq. but we do see this outside of these classes as well.

I haven't seen those yet.

bq. If I understand you, the call to StorageService.onChange (which calls 
handleStateNormal) happens-after gossip status check, so removes the state?

It's hard to know the exact order but it's clear from the first few log lines 
that the status check and onChange are happening concurrently.

bq. If I understand you correctly, this feels like a race condition where we 
read data not fully committed, which feels like a bug (which is why it was set 
to 0 in the first place). Do I understand you Brandon Williams?

That may be the case as well, but yes you understand my theory thus far.




> Fix flaky jvm-dtests that fail with Unable to contact any seeds
> ---
>
> Key: CASSANDRA-16238
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16238
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-rc
>
> Attachments: 16238-archived-failures.txt
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231
> {code}
> test teardown failure
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), 
> ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds

2021-04-26 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332802#comment-17332802
 ] 

David Capwell commented on CASSANDRA-16238:
---

Review

* 
https://github.com/apache/cassandra/compare/trunk...driftx:CASSANDRA-16238#diff-99267a2170b04fd7dd24d6c6bf2ba1fc26d6dc896cd74f8c5bd56c476e2540e4R580
 - nit: you can call isEmpty rather than size

For the host replacement tests, I set that field low to help find issues, so if 
this case happens more frequently because of that I am cool removing it; but we 
do see this outside of these classes as well.

bq. .1 is detected for the first time via gossip, and as it is going through 
StorageService but before it is added to TokenMetatadata, the gossiper's status 
check has begun running

If I understand you, the call to StorageService.onChange (which calls 
handleStateNormal) happens-after gossip status check, so removes the state? 
GossipDigestAck2 should be handled in the gossip stage and eventually call 
applyNewStates to apply the state and trigger notifications, but doStatusCheck 
is called in the GossipTasks thread pool, which checks isGossipOnlyMember which 
returns true in this case (as state isn't fully settled yet), at which point we 
schedule a task in the gossip stage to remove (but at this point the 
isGossipOnlyMember(endpoint) == false).  

If I understand you correctly, this feels like a race condition where we read 
data not fully committed, which feels like a bug (which is why it was set to 0 
in the first place).  Do I understand you [~brandon.williams]?

> Fix flaky jvm-dtests that fail with Unable to contact any seeds
> ---
>
> Key: CASSANDRA-16238
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16238
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-rc
>
> Attachments: 16238-archived-failures.txt
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231
> {code}
> test teardown failure
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), 
> ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds

2021-04-26 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332790#comment-17332790
 ] 

David Capwell commented on CASSANDRA-16238:
---

The host replacement tests set it low, but what about the other classes which 
hit this (such as the original test 
https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231)?
  We have seen this with 
nodeDownDuringMove(org.apache.cassandra.distributed.test.GossipTest) as well.

> Fix flaky jvm-dtests that fail with Unable to contact any seeds
> ---
>
> Key: CASSANDRA-16238
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16238
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest/java
>Reporter: David Capwell
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-rc
>
> Attachments: 16238-archived-failures.txt
>
>
> https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231
> {code}
> test teardown failure
> Unexpected error found in node logs (see stdout for full details). Errors: 
> [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), 
> ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception 
> encountered during startup
> java.lang.IllegalStateException: Unable to contact any seeds!
>   at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931)
>   at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:699)
>   at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:635)
>   at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407)
>   at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671)
>   at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds

2021-04-26 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332737#comment-17332737
 ] 

Brandon Williams commented on CASSANDRA-16238:
--

Branch [here|https://github.com/driftx/cassandra/tree/CASSANDRA-16238].

[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/714/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/714/pipeline]

{noformat}
[junit-timeout] INFO  [node4_GossipStage:1] node4 2021-04-23 16:23:05,381 
Gossiper.java:1296 - Node /127.0.0.1:7012 is now part of the cluster
[junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:05,381 
StorageService.java:2730 - Node /127.0.0.1:7012 state NORMAL, token 
[-3074457345618258603]
[junit-timeout] INFO  [node4_GossipTasks:1] node4 2021-04-23 16:23:05,393 
Gossiper.java:997 - FatClient /127.0.0.1:7012 has been silent for 0ms, removing 
from gossip
[junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:05,398 
StorageService.java:2677 - New node /127.0.0.1:7012 at token 
-3074457345618258603
{noformat}

We can see here that .1 is detected for the first time via gossip, and as it is 
going through StorageService but before it is added to TokenMetatadata, the 
gossiper's status check has begun running.  Since the quarantine delay is 
overridden to zero, without a presence in TMD the node is not a member yet and 
thus deemed a fat client, and removed.

{noformat}
[junit-timeout] INFO  [node4_GossipStage:1] node4 2021-04-23 16:23:05,407 
TokenMetadata.java:505 - Updating topology for /127.0.0.1:7012
[junit-timeout] INFO  [node4_GossipStage:1] node4 2021-04-23 16:23:05,414 
TokenMetadata.java:505 - Updating topology for /127.0.0.1:7012
[junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:05,422 
StorageService.java:2730 - Node /127.0.0.1:7012 state NORMAL, token 
[-3074457345618258603]
[junit-timeout] INFO  [node4_GossipStage:1] node4 2021-04-23 16:23:05,422 
StorageService.java:2733 - Node /127.0.0.1:7012 state jump to NORMAL
[junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:05,447 
Gossiper.java:1243 - removing expire time for endpoint : /127.0.0.1:7012
[junit-timeout] INFO  [node4_GossipStage:1] node4 2021-04-23 16:23:05,447 
Gossiper.java:1244 - InetAddress /127.0.0.1:7012 is now UP
[junit-timeout] INFO  [node4_GossipStage:1] node4 2021-04-23 16:23:05,447 
Gossiper.java:579 - removed /127.0.0.1:7012 from seeds, updated seeds list = []
[junit-timeout] WARN  16:23:05 Seeds list is now empty!
[junit-timeout] WARN  [node4_GossipStage:1] node4 2021-04-23 16:23:05,447 
Gossiper.java:581 - Seeds list is now empty!
[junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:05,452 
Gossiper.java:590 - removing endpoint /127.0.0.1:7012
[junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:05,452 
Gossiper.java:561 - evicting /127.0.0.1:7012 from gossip
[junit-timeout] DEBUG [node4_GossipTasks:1] node4 2021-04-23 16:23:06,453 
Gossiper.java:1025 - 0 elapsed, /127.0.0.1:7012 gossip quarantine over
{noformat}

Crucially, as part of this removal the node is also removed from the seeds 
list, since it is listed there. The warning about the empty seed list is added 
from my branch.

{noformat}
[junit-timeout] INFO  [node4_GossipStage:1] node4 2021-04-23 16:23:07,176 
Gossiper.java:1296 - Node /127.0.0.1:7012 is now part of the cluster
[junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:07,177 
StorageService.java:2730 - Node /127.0.0.1:7012 state NORMAL, token 
[-3074457345618258603]
[junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:07,189 
StorageService.java:2677 - New node /127.0.0.1:7012 at token 
-3074457345618258603
[junit-timeout] INFO  [node4_GossipStage:1] node4 2021-04-23 16:23:07,198 
TokenMetadata.java:505 - Updating topology for /127.0.0.1:7012
[junit-timeout] INFO  [node4_GossipStage:1] node4 2021-04-23 16:23:07,201 
TokenMetadata.java:505 - Updating topology for /127.0.0.1:7012
[junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:07,208 
StorageService.java:2730 - Node /127.0.0.1:7012 state NORMAL, token 
[-3074457345618258603]
[junit-timeout] INFO  [node4_GossipStage:1] node4 2021-04-23 16:23:07,208 
StorageService.java:2733 - Node /127.0.0.1:7012 state jump to NORMAL
[junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:07,220 
Gossiper.java:1243 - removing expire time for endpoint : /127.0.0.1:7012
[junit-timeout] INFO  [node4_GossipStage:1] node4 2021-04-23 16:23:07,220 
Gossiper.java:1244 - InetAddress /127.0.0.1:7012 is now UP
[junit-timeout] DEBUG [node4_BatchlogTasks:1] node4 2021-04-23 16:23:07,383 
BatchlogManager.java:246 - Updating batchlog replay throttle to 1024 KB/s, 256 
KB/s per endpoint
[junit-timeout] DEBUG [node4_isolatedExecutor:1] node4 2021-04-23 16:23:12,457 
Gossiper.java:2142 - Gossip looks settled.