[jira] [Comment Edited] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints
[ https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836370#comment-17836370 ] Cameron Zemek edited comment on CASSANDRA-18845 at 4/11/24 10:18 PM: - I have reworked the [patch| [^CASSANDRA-18845-4_0_12.patch]] more so it a new method instead of modifying the existing waitToSettle, so it has the least change to any existing behavior. It directly called in MigrationCoordinator::awaitSchemaRequests to handle if node bootstrapping (since need nodes in UP state in order to get schema and stream sstables from). And just before enabling native transport. was (Author: cam1982): I have reworked the patch more so it a new method instead of modifying the existing waitToSettle. So it has the least change to any existing behavior. It directly called in MigrationCoordinator::awaitSchemaRequests to handle if node bootstrapping (since need nodes in UP state in order to get schema and stream sstables from). And just before enabling native transport. https://issues.apache.org/jira/secure/attachment/13068153/CASSANDRA-18845-4_0_12.patch > Waiting for gossip to settle on live endpoints > -- > > Key: CASSANDRA-18845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18845 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Priority: Normal > Attachments: 18845-seperate.patch, CASSANDRA-18845-4_0_12.patch, > delay.log, example.log, image-2023-09-14-11-16-23-020.png, stream.log, > test1.log, test2.log, test3.log > > > This is a follow up to CASSANDRA-18543 > Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms > this is tedious and error prone. On a node just observed a 79 second gap > between waiting for gossip and the first echo response to indicate a node is > UP. > The problem being that do not want to start Native Transport until gossip > settles otherwise queries can fail consistency such as LOCAL_QUORUM as it > thinks the replicas are still in DOWN state. > Instead of having to set gossip_settle_min_wait_ms I am proposing that > (outside single node cluster) wait for UP message from another node before > considering gossip as settled. Eg. > {code:java} > if (currentSize == epSize && currentLive == liveSize && liveSize > > 1) > { > logger.debug("Gossip looks settled."); > numOkay++; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints
[ https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767358#comment-17767358 ] Cameron Zemek edited comment on CASSANDRA-18845 at 9/21/23 2:59 AM: {noformat} Sep 19 08:09:45 ip-10-1-57-23 cassandra[131402]: INFO org.apache.cassandra.gms.Gossiper Waiting for gossip to settle... Sep 19 08:10:56 ip-10-1-57-23 cassandra[131402]: DEBUG org.apache.cassandra.gms.Gossiper Sending a EchoMessage to /35.83.14.80{noformat} I am struggling to reproduce this ^ I seen it twice, and after enabling more logging haven't been able to reproduce again. What I do sometimes see though is it taking over 30 seconds to get the first ECHO response. Since there are dtests that rely on having CQL up while nodes are down, I have attached a patch [^18845-seperate.patch] (against 5.0 branch) that is opt-in. Having settle just check for currentLive == liveSize is still allowing NTR to start while nodes are marked down. Yes you can increase cassandra.gossip_settle_poll_success_required (and/or the other properties) to mitigate it but these increase the minimum startup time. Whereas [^18845-seperate.patch] doesn't add to this when the cluster is healthy. A more elaborate solution would be to specify the required consistency level. And for all token ranges owned by the node you check if you have the needed live endpoints to satisfy the consistency level. was (Author: cam1982): {noformat} Sep 19 08:09:45 ip-10-1-57-23 cassandra[131402]: INFO org.apache.cassandra.gms.Gossiper Waiting for gossip to settle... Sep 19 08:10:56 ip-10-1-57-23 cassandra[131402]: DEBUG org.apache.cassandra.gms.Gossiper Sending a EchoMessage to /35.83.14.80{noformat} I am struggling to reproduce this ^ I seen it twice, and after enabling more logging haven't been able to reproduce again. What I do sometimes see though it taking over 30 seconds to get the first ECHO response. Since there are dtests that rely on having CQL up while nodes are down, I have attached a patch [^18845-seperate.patch] (against 5.0 branch) that is opt-in. Having settle just check for currentLive == liveSize is still allowing NTR to start while nodes are marked down. Yes you can increase cassandra.gossip_settle_poll_success_required (and/or the other properties) to mitigate it but these increase the minimum startup time. Whereas [^18845-seperate.patch] doesn't add to this when the cluster is healthy. A more elaborate solution would be to specify the required consistency level. And for all token ranges owned by the node you check if you have the needed live endpoints to satisfy the consistency level. > Waiting for gossip to settle on live endpoints > -- > > Key: CASSANDRA-18845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18845 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Priority: Normal > Attachments: 18845-seperate.patch, delay.log, example.log, > image-2023-09-14-11-16-23-020.png, stream.log, test1.log, test2.log, test3.log > > > This is a follow up to CASSANDRA-18543 > Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms > this is tedious and error prone. On a node just observed a 79 second gap > between waiting for gossip and the first echo response to indicate a node is > UP. > The problem being that do not want to start Native Transport until gossip > settles otherwise queries can fail consistency such as LOCAL_QUORUM as it > thinks the replicas are still in DOWN state. > Instead of having to set gossip_settle_min_wait_ms I am proposing that > (outside single node cluster) wait for UP message from another node before > considering gossip as settled. Eg. > {code:java} > if (currentSize == epSize && currentLive == liveSize && liveSize > > 1) > { > logger.debug("Gossip looks settled."); > numOkay++; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints
[ https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766850#comment-17766850 ] Stefan Miklosovic edited comment on CASSANDRA-18845 at 9/19/23 3:31 PM: [~cam1982] you can simulate lost echo even in a setup with 2 nodes. This is possible with in-jvm dtests, definitely. You can drop whole communication between nodes like this (1) and then resume it afterwards like this (2). (1) https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/test/AuthTest.java#L99-L101 (2) https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/test/AuthTest.java#L118-L120 was (Author: smiklosovic): [~cam1982] you can simulate lost echo even in a setup with 2 nodes. This is possible with in-jvm dtests, definitely. You can drop whole communication between nodes like this (1) (1) https://github.com/apache/cassandra/blob/trunk/test/distributed/org/apache/cassandra/distributed/test/AuthTest.java#L99-L101 > Waiting for gossip to settle on live endpoints > -- > > Key: CASSANDRA-18845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18845 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Priority: Normal > Attachments: delay.log, example.log, > image-2023-09-14-11-16-23-020.png, test1.log, test2.log, test3.log > > > This is a follow up to CASSANDRA-18543 > Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms > this is tedious and error prone. On a node just observed a 79 second gap > between waiting for gossip and the first echo response to indicate a node is > UP. > The problem being that do not want to start Native Transport until gossip > settles otherwise queries can fail consistency such as LOCAL_QUORUM as it > thinks the replicas are still in DOWN state. > Instead of having to set gossip_settle_min_wait_ms I am proposing that > (outside single node cluster) wait for UP message from another node before > considering gossip as settled. Eg. > {code:java} > if (currentSize == epSize && currentLive == liveSize && liveSize > > 1) > { > logger.debug("Gossip looks settled."); > numOkay++; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints
[ https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766677#comment-17766677 ] Cameron Zemek edited comment on CASSANDRA-18845 at 9/19/23 7:32 AM: Tested the patch 3 times to confirm it working. See test1.log test2.log and test3.log was (Author: cam1982): !test1.log|width=7,height=7,align=absmiddle! !test2.log|width=7,height=7,align=absmiddle! !test3.log|width=7,height=7,align=absmiddle! Tested the patch 3 times to confirm it working. > Waiting for gossip to settle on live endpoints > -- > > Key: CASSANDRA-18845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18845 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Priority: Normal > Attachments: example.log, image-2023-09-14-11-16-23-020.png, > test1.log, test2.log, test3.log > > > This is a follow up to CASSANDRA-18543 > Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms > this is tedious and error prone. On a node just observed a 79 second gap > between waiting for gossip and the first echo response to indicate a node is > UP. > The problem being that do not want to start Native Transport until gossip > settles otherwise queries can fail consistency such as LOCAL_QUORUM as it > thinks the replicas are still in DOWN state. > Instead of having to set gossip_settle_min_wait_ms I am proposing that > (outside single node cluster) wait for UP message from another node before > considering gossip as settled. Eg. > {code:java} > if (currentSize == epSize && currentLive == liveSize && liveSize > > 1) > { > logger.debug("Gossip looks settled."); > numOkay++; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints
[ https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765558#comment-17765558 ] Brandon Williams edited comment on CASSANDRA-18845 at 9/15/23 11:03 AM: CASSANDRA-18543 is going to be reverted on CASSANDRA-18854 for causing a regression. The next step for this ticket to move forward will be to create tests that demonstrate the problem and guard against regressions. was (Author: brandon.williams): CASSANDRA-18543 is going to be reverted for causing a regression. The next step for this ticket to move forward will be to create tests that demonstrate the problem and guard against regressions. > Waiting for gossip to settle on live endpoints > -- > > Key: CASSANDRA-18845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18845 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Priority: Normal > Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, > 18845-5.0.patch, image-2023-09-14-11-16-23-020.png > > > This is a follow up to CASSANDRA-18543 > Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms > this is tedious and error prone. On a node just observed a 79 second gap > between waiting for gossip and the first echo response to indicate a node is > UP. > The problem being that do not want to start Native Transport until gossip > settles otherwise queries can fail consistency such as LOCAL_QUORUM as it > thinks the replicas are still in DOWN state. > Instead of having to set gossip_settle_min_wait_ms I am proposing that > (outside single node cluster) wait for UP message from another node before > considering gossip as settled. Eg. > {code:java} > if (currentSize == epSize && currentLive == liveSize && liveSize > > 1) > { > logger.debug("Gossip looks settled."); > numOkay++; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints
[ https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765035#comment-17765035 ] Stefan Miklosovic edited comment on CASSANDRA-18845 at 9/14/23 7:29 AM: Interesting. I am curious what causes that initial delay. What you are saying is that it takes a lot of time for the nodes to be up and then it appears (from the log you posted) like all of them are reported more or less at the same time? There is an initial delay of dozes of seconds before it starts to get reported? If that is true then it probably makes sense to have a condition like that so we see at least some other nodes to be up to count it and increase numOkay. However, if we have this {code:java} if (currentSize == epSize && currentLive == liveSize && (epSize == liveSize || liveSize > 1)) {code} Then what if we have {code} currentSize = 2 , epSize = 2, currentLive = 2, liveSize = 2 {code} That "if" would return true, so numOkay would be increased and it would count it as a valid round. However, and it is a little bit hard to formulate it correctly, but is not it true that we are not guaranteeing that QUORUM would be satisfied here anyway? Because it could stay on all "twos" for all rounds and we would say that gossip settled while there is bunch of other nodes to be reported but they just have not made it and we were stuck on 2 for three rounds. was (Author: smiklosovic): Interesting. I am curious what causes that initial delay. What you are saying is that it takes a lot of time for the nodes to be up and then it appears (from the log you posted) like all of them are reported more or less at the same time? There is an initial delay of dozes of seconds before it starts to get reported? If that is true then it probably makes sense to have a condition like that so we see at least some other nodes to be up to count it and increase numOkay. However, if we have this {code:java} if (currentSize == epSize && currentLive == liveSize && (epSize == liveSize || liveSize > 1)) {code} Then what if we have {code} currentSize = 2 , epSize = 2, currentLive = 2, liveSize = 2 {code} That "if" would return true, so numOkay would be increased and it would count it as a valid round. However, and it is a little bit hard to formulate it correctly, but is not it true that we are not guaranteeing that QUORUM would be satisfied here anyway? Because it could stay on all "twos" for all rounds and we would say that gossip settled while there is bunch of other nodes to be reported but they just have not make it and we were stuck on 2 for three rounds. > Waiting for gossip to settle on live endpoints > -- > > Key: CASSANDRA-18845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18845 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Priority: Normal > Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, > 18845-5.0.patch, image-2023-09-14-11-16-23-020.png > > > This is a follow up to CASSANDRA-18543 > Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms > this is tedious and error prone. On a node just observed a 79 second gap > between waiting for gossip and the first echo response to indicate a node is > UP. > The problem being that do not want to start Native Transport until gossip > settles otherwise queries can fail consistency such as LOCAL_QUORUM as it > thinks the replicas are still in DOWN state. > Instead of having to set gossip_settle_min_wait_ms I am proposing that > (outside single node cluster) wait for UP message from another node before > considering gossip as settled. Eg. > {code:java} > if (currentSize == epSize && currentLive == liveSize && liveSize > > 1) > { > logger.debug("Gossip looks settled."); > numOkay++; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints
[ https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764726#comment-17764726 ] Stefan Miklosovic edited comment on CASSANDRA-18845 at 9/13/23 2:51 PM: Yeah, like ... if there is 20 nodes, RF is 5 and QUORUM is 3, then "liveSize > 1" is at least 2. But how do we know that these "2" satisfy _each query on local quorum_ ? Maybe there is a query for which quorum requires such nodes to be alive which are not detected yet, or maybe I am missing something here. was (Author: smiklosovic): Yeah, like ... if there is 20 nodes, RF is 5 and QUORUM is 3, then "liveSize > 1" is at least 2. But how do we know that these "2" satisfy _each query on local quorum_ ? Maybe there is a query for which quorum requires such nodes live which are not detected yet, or maybe I am missing something here. > Waiting for gossip to settle on live endpoints > -- > > Key: CASSANDRA-18845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18845 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Priority: Normal > Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, > 18845-5.0.patch > > > This is a follow up to CASSANDRA-18543 > Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms > this is tedious and error prone. On a node just observed a 79 second gap > between waiting for gossip and the first echo response to indicate a node is > UP. > The problem being that do not want to start Native Transport until gossip > settles otherwise queries can fail consistency such as LOCAL_QUORUM as it > thinks the replicas are still in DOWN state. > Instead of having to set gossip_settle_min_wait_ms I am proposing that > (outside single node cluster) wait for UP message from another node before > considering gossip as settled. Eg. > {code:java} > if (currentSize == epSize && currentLive == liveSize && liveSize > > 1) > { > logger.debug("Gossip looks settled."); > numOkay++; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints
[ https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764515#comment-17764515 ] Stefan Miklosovic edited comment on CASSANDRA-18845 at 9/13/23 6:58 AM: I instructed Cameron privately about strong preference for an in-jvm dtest to verify and test this behavior. Looking at the test steps described in his comment, it should be rather straightforward to come up with one. was (Author: smiklosovic): I instructed Cameron privately about strong preference for an in-jvm dtest to verify and test this behavior. Looking at the test steps described in his comment about, it should be rather straightforward to come up with one. > Waiting for gossip to settle on live endpoints > -- > > Key: CASSANDRA-18845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18845 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Priority: Normal > Attachments: 18845-3.11.patch, 18845-4.0.patch, 18845-4.1.patch, > 18845-5.0.patch > > > This is a follow up to CASSANDRA-18543 > Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms > this is tedious and error prone. On a node just observed a 79 second gap > between waiting for gossip and the first echo response to indicate a node is > UP. > The problem being that do not want to start Native Transport until gossip > settles otherwise queries can fail consistency such as LOCAL_QUORUM as it > thinks the replicas are still in DOWN state. > Instead of having to set gossip_settle_min_wait_ms I am proposing that > (outside single node cluster) wait for UP message from another node before > considering gossip as settled. Eg. > {code:java} > if (currentSize == epSize && currentLive == liveSize && liveSize > > 1) > { > logger.debug("Gossip looks settled."); > numOkay++; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18845) Waiting for gossip to settle on live endpoints
[ https://issues.apache.org/jira/browse/CASSANDRA-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764467#comment-17764467 ] Cameron Zemek edited comment on CASSANDRA-18845 at 9/13/23 3:32 AM: I have attached patched. Tested this as follows: # Spin up single node cluster. Works due to epSize == liveSize check that lets it bypass the liveSize > 1 check # Spin up 3 node cluster. All 3 nodes start up NTR as expected. # Shutdown all nodes. Start up first node it stays waiting in gossip due to the liveSize > 1 requirement # Start up second node. Now both nodes start NTR since liveSize > 1 and there are no other incoming `is now UP` events so gossip looks settled. NOTE: I had to disable the if condition for call to Gossiper.waitToSettle() since was using loopback addresses was (Author: cam1982): I have attached patched. Tested this as follows: # Spin up single node cluster. Works due to epSize == liveSize check that lets it bypass the liveSize > 1 check # Spin up 3 node cluster. All 3 nodes start up NTR as expected. # Shutdown all nodes. Start up first node it stays waiting in gossip due to the liveSize > 1 requirement # Start up second node. Now both nodes start NTR since liveSize > 1 and there are no other incoming `is now UP` events so gossip looks settled. > Waiting for gossip to settle on live endpoints > -- > > Key: CASSANDRA-18845 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18845 > Project: Cassandra > Issue Type: Improvement >Reporter: Cameron Zemek >Priority: Normal > Attachments: 18845-3.11.patch > > > This is a follow up to CASSANDRA-18543 > Although that ticket added ability to set cassandra.gossip_settle_min_wait_ms > this is tedious and error prone. On a node just observed a 79 second gap > between waiting for gossip and the first echo response to indicate a node is > UP. > The problem being that do not want to start Native Transport until gossip > settles otherwise queries can fail consistency such as LOCAL_QUORUM as it > thinks the replicas are still in DOWN state. > Instead of having to set gossip_settle_min_wait_ms I am proposing that > (outside single node cluster) wait for UP message from another node before > considering gossip as settled. Eg. > {code:java} > if (currentSize == epSize && currentLive == liveSize && liveSize > > 1) > { > logger.debug("Gossip looks settled."); > numOkay++; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org