[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335714#comment-17335714 ] David Capwell commented on CASSANDRA-16238: --- +1 > Fix flaky jvm-dtests that fail with Unable to contact any seeds > --- > > Key: CASSANDRA-16238 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16238 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0-rc > > Attachments: 16238-archived-failures.txt > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231 > {code} > test teardown failure > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), > ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335687#comment-17335687 ] Brandon Williams commented on CASSANDRA-16238: -- [Branch|https://github.com/driftx/cassandra/tree/CASSANDRA-16238] [!https://ci-cassandra.apache.org/job/Cassandra-devbranch/716/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/716/pipeline] > Fix flaky jvm-dtests that fail with Unable to contact any seeds > --- > > Key: CASSANDRA-16238 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16238 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0-rc > > Attachments: 16238-archived-failures.txt > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231 > {code} > test teardown failure > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), > ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334869#comment-17334869 ] Brandon Williams commented on CASSANDRA-16238: -- {noformat} [junit-timeout] WARN [node4_GossipStage:1] node4 2021-04-28 16:02:24,532 Gossiper.java:1002 - Race condition marking /127.0.0.2:7012 as a FatClient; ignoring {noformat} That will work too. > Fix flaky jvm-dtests that fail with Unable to contact any seeds > --- > > Key: CASSANDRA-16238 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16238 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0-rc > > Attachments: 16238-archived-failures.txt > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231 > {code} > test teardown failure > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), > ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334402#comment-17334402 ] David Capwell commented on CASSANDRA-16238: --- If this is a race reading un-committed data, then the patch below might work around it (double check locking, but with queues) Patch: {code} $ git diff diff --git a/src/java/org/apache/cassandra/gms/Gossiper.java b/src/java/org/apache/cassandra/gms/Gossiper.java index 699f235bd3..e82b107bac 100644 --- a/src/java/org/apache/cassandra/gms/Gossiper.java +++ b/src/java/org/apache/cassandra/gms/Gossiper.java @@ -928,6 +928,11 @@ public class Gossiper implements IFailureDetectionEventListener, GossiperMBean { logger.info("FatClient {} has been silent for {}ms, removing from gossip", endpoint, fatClientTimeout); runInGossipStageBlocking(() -> { +if (!isGossipOnlyMember(endpoint)) +{ +// updating gossip and token metadata are not atomic, but rely on the single threaded gossip stage +// since status checks are done outside the gossip stage, need to confirm the state of the endpoint +// to make sure that the previous read data was correct +logger.info("Race condition marking {} as a FatClient; ignoring", endpoint); +return; +} removeEndpoint(endpoint); // will put it in justRemovedEndpoints to respect quarantine delay evictFromMembership(endpoint); // can get rid of the state immediately }); {code} > Fix flaky jvm-dtests that fail with Unable to contact any seeds > --- > > Key: CASSANDRA-16238 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16238 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0-rc > > Attachments: 16238-archived-failures.txt > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231 > {code} > test teardown failure > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), > ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17333296#comment-17333296 ] Brandon Williams commented on CASSANDRA-16238: -- bq. but we do see this outside of these classes as well. I haven't seen those yet. bq. If I understand you, the call to StorageService.onChange (which calls handleStateNormal) happens-after gossip status check, so removes the state? It's hard to know the exact order but it's clear from the first few log lines that the status check and onChange are happening concurrently. bq. If I understand you correctly, this feels like a race condition where we read data not fully committed, which feels like a bug (which is why it was set to 0 in the first place). Do I understand you Brandon Williams? That may be the case as well, but yes you understand my theory thus far. > Fix flaky jvm-dtests that fail with Unable to contact any seeds > --- > > Key: CASSANDRA-16238 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16238 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0-rc > > Attachments: 16238-archived-failures.txt > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231 > {code} > test teardown failure > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), > ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332802#comment-17332802 ] David Capwell commented on CASSANDRA-16238: --- Review * https://github.com/apache/cassandra/compare/trunk...driftx:CASSANDRA-16238#diff-99267a2170b04fd7dd24d6c6bf2ba1fc26d6dc896cd74f8c5bd56c476e2540e4R580 - nit: you can call isEmpty rather than size For the host replacement tests, I set that field low to help find issues, so if this case happens more frequently because of that I am cool removing it; but we do see this outside of these classes as well. bq. .1 is detected for the first time via gossip, and as it is going through StorageService but before it is added to TokenMetatadata, the gossiper's status check has begun running If I understand you, the call to StorageService.onChange (which calls handleStateNormal) happens-after gossip status check, so removes the state? GossipDigestAck2 should be handled in the gossip stage and eventually call applyNewStates to apply the state and trigger notifications, but doStatusCheck is called in the GossipTasks thread pool, which checks isGossipOnlyMember which returns true in this case (as state isn't fully settled yet), at which point we schedule a task in the gossip stage to remove (but at this point the isGossipOnlyMember(endpoint) == false). If I understand you correctly, this feels like a race condition where we read data not fully committed, which feels like a bug (which is why it was set to 0 in the first place). Do I understand you [~brandon.williams]? > Fix flaky jvm-dtests that fail with Unable to contact any seeds > --- > > Key: CASSANDRA-16238 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16238 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0-rc > > Attachments: 16238-archived-failures.txt > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231 > {code} > test teardown failure > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), > ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332790#comment-17332790 ] David Capwell commented on CASSANDRA-16238: --- The host replacement tests set it low, but what about the other classes which hit this (such as the original test https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231)? We have seen this with nodeDownDuringMove(org.apache.cassandra.distributed.test.GossipTest) as well. > Fix flaky jvm-dtests that fail with Unable to contact any seeds > --- > > Key: CASSANDRA-16238 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16238 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest/java >Reporter: David Capwell >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0-rc > > Attachments: 16238-archived-failures.txt > > > https://app.circleci.com/pipelines/github/dcapwell/cassandra/745/workflows/1c7e589e-b5af-4a56-b40a-43da424602c7/jobs/4231 > {code} > test teardown failure > Unexpected error found in node logs (see stdout for full details). Errors: > [ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795), > ERROR [main] 2020-10-29 17:38:13,808 CassandraDaemon.java:817 - Exception > encountered during startup > java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1601) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:931) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:892) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:699) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:635) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:407) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:671) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:795)] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-16238) Fix flaky jvm-dtests that fail with Unable to contact any seeds
[ https://issues.apache.org/jira/browse/CASSANDRA-16238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17332737#comment-17332737 ] Brandon Williams commented on CASSANDRA-16238: -- Branch [here|https://github.com/driftx/cassandra/tree/CASSANDRA-16238]. [!https://ci-cassandra.apache.org/job/Cassandra-devbranch/714/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/714/pipeline] {noformat} [junit-timeout] INFO [node4_GossipStage:1] node4 2021-04-23 16:23:05,381 Gossiper.java:1296 - Node /127.0.0.1:7012 is now part of the cluster [junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:05,381 StorageService.java:2730 - Node /127.0.0.1:7012 state NORMAL, token [-3074457345618258603] [junit-timeout] INFO [node4_GossipTasks:1] node4 2021-04-23 16:23:05,393 Gossiper.java:997 - FatClient /127.0.0.1:7012 has been silent for 0ms, removing from gossip [junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:05,398 StorageService.java:2677 - New node /127.0.0.1:7012 at token -3074457345618258603 {noformat} We can see here that .1 is detected for the first time via gossip, and as it is going through StorageService but before it is added to TokenMetatadata, the gossiper's status check has begun running. Since the quarantine delay is overridden to zero, without a presence in TMD the node is not a member yet and thus deemed a fat client, and removed. {noformat} [junit-timeout] INFO [node4_GossipStage:1] node4 2021-04-23 16:23:05,407 TokenMetadata.java:505 - Updating topology for /127.0.0.1:7012 [junit-timeout] INFO [node4_GossipStage:1] node4 2021-04-23 16:23:05,414 TokenMetadata.java:505 - Updating topology for /127.0.0.1:7012 [junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:05,422 StorageService.java:2730 - Node /127.0.0.1:7012 state NORMAL, token [-3074457345618258603] [junit-timeout] INFO [node4_GossipStage:1] node4 2021-04-23 16:23:05,422 StorageService.java:2733 - Node /127.0.0.1:7012 state jump to NORMAL [junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:05,447 Gossiper.java:1243 - removing expire time for endpoint : /127.0.0.1:7012 [junit-timeout] INFO [node4_GossipStage:1] node4 2021-04-23 16:23:05,447 Gossiper.java:1244 - InetAddress /127.0.0.1:7012 is now UP [junit-timeout] INFO [node4_GossipStage:1] node4 2021-04-23 16:23:05,447 Gossiper.java:579 - removed /127.0.0.1:7012 from seeds, updated seeds list = [] [junit-timeout] WARN 16:23:05 Seeds list is now empty! [junit-timeout] WARN [node4_GossipStage:1] node4 2021-04-23 16:23:05,447 Gossiper.java:581 - Seeds list is now empty! [junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:05,452 Gossiper.java:590 - removing endpoint /127.0.0.1:7012 [junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:05,452 Gossiper.java:561 - evicting /127.0.0.1:7012 from gossip [junit-timeout] DEBUG [node4_GossipTasks:1] node4 2021-04-23 16:23:06,453 Gossiper.java:1025 - 0 elapsed, /127.0.0.1:7012 gossip quarantine over {noformat} Crucially, as part of this removal the node is also removed from the seeds list, since it is listed there. The warning about the empty seed list is added from my branch. {noformat} [junit-timeout] INFO [node4_GossipStage:1] node4 2021-04-23 16:23:07,176 Gossiper.java:1296 - Node /127.0.0.1:7012 is now part of the cluster [junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:07,177 StorageService.java:2730 - Node /127.0.0.1:7012 state NORMAL, token [-3074457345618258603] [junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:07,189 StorageService.java:2677 - New node /127.0.0.1:7012 at token -3074457345618258603 [junit-timeout] INFO [node4_GossipStage:1] node4 2021-04-23 16:23:07,198 TokenMetadata.java:505 - Updating topology for /127.0.0.1:7012 [junit-timeout] INFO [node4_GossipStage:1] node4 2021-04-23 16:23:07,201 TokenMetadata.java:505 - Updating topology for /127.0.0.1:7012 [junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:07,208 StorageService.java:2730 - Node /127.0.0.1:7012 state NORMAL, token [-3074457345618258603] [junit-timeout] INFO [node4_GossipStage:1] node4 2021-04-23 16:23:07,208 StorageService.java:2733 - Node /127.0.0.1:7012 state jump to NORMAL [junit-timeout] DEBUG [node4_GossipStage:1] node4 2021-04-23 16:23:07,220 Gossiper.java:1243 - removing expire time for endpoint : /127.0.0.1:7012 [junit-timeout] INFO [node4_GossipStage:1] node4 2021-04-23 16:23:07,220 Gossiper.java:1244 - InetAddress /127.0.0.1:7012 is now UP [junit-timeout] DEBUG [node4_BatchlogTasks:1] node4 2021-04-23 16:23:07,383 BatchlogManager.java:246 - Updating batchlog replay throttle to 1024 KB/s, 256 KB/s per endpoint [junit-timeout] DEBUG [node4_isolatedExecutor:1] node4 2021-04-23 16:23:12,457 Gossiper.java:2142 - Gossip looks settled.