[jira] [Commented] (CASSANDRA-14930) decommission may cause timeout because messaging backlog is cleared
[ https://issues.apache.org/jira/browse/CASSANDRA-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644327#comment-17644327 ] Stefan Miklosovic commented on CASSANDRA-14930: --- I went through the PR and the problem I see is that it does not work for nodes which stay in the ring. If I have 3 nodes and I decommission one of them, the nodes to stay will see that node as unreachable. The current logic checks this (1) but "!unreachableEndpoints.contains(ep)" will be evaluated as false, because that endpoint is put back among unreachables here (2). So it will not call that "destroyConnectionPool" method and it logs that "Not destroying messaging connection to xyz due to endpoint starting to gossip again" which is obviously not true. I am not completely sure how to go around this, maybe we could just leave that unreachable.contains() check out? Branch for 3.0 with my so-far changes is here (3). I fixed one possible NPE, you ll see that. (1) https://github.com/jasonstack/cassandra/blob/994b46b6882d7847f2da839968f52dbadb57fe1e/src/java/org/apache/cassandra/gms/Gossiper.java#L441 (2) https://github.com/jasonstack/cassandra/blob/994b46b6882d7847f2da839968f52dbadb57fe1e/src/java/org/apache/cassandra/gms/Gossiper.java#L1044 (3) https://github.com/instaclustr/cassandra/tree/CASSANDRA-14930 > decommission may cause timeout because messaging backlog is cleared > > > Key: CASSANDRA-14930 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14930 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Core >Reporter: Zhao Yang >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > On a 3-node cluster with RF=2, decommissioning a node may cause quorum write > timeout because messaging backlog to decommissioned node is cleared via > {{Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket()}}. > (Timeout is less likely to happen with RF=3, because we can afford one less > response) > {code:java} > What happened: > 1. [WriteStage] before the leaving node is removed from tokenmetadata, the > write endpoints are generated ( leaving endpoint is included ) > 2. [GossipStage] the leaving node is removed from tokenmetadata, no more > future write handler will include leaving endpoints > 3. [WriteStage] write handlers sends messages to messaging-service backlog > 4. [GossipStage] messaging-service backlog is cleared, messages are not sent > and connection closed > 5. [WriteStage] write time out > {code} > |patch| > |[3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.0]| > |[3.11|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.11]| > We can avoid it by delaying to destroy messaging connection so that messages > are sent and responded. This patch also avoids reopening already closed > connection on {{MessagingService#convict()}}. > New messaging framework rewrite in {{Trunk}} avoids the issues by not > clearing messaging backlog. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14930) decommission may cause timeout because messaging backlog is cleared
[ https://issues.apache.org/jira/browse/CASSANDRA-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643605#comment-17643605 ] Zhao Yang commented on CASSANDRA-14930: --- [~smiklosovic] I will be on leaves for quite a while. Could you please take over? thank you! > decommission may cause timeout because messaging backlog is cleared > > > Key: CASSANDRA-14930 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14930 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Core >Reporter: Zhao Yang >Assignee: Zhao Yang >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > On a 3-node cluster with RF=2, decommissioning a node may cause quorum write > timeout because messaging backlog to decommissioned node is cleared via > {{Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket()}}. > (Timeout is less likely to happen with RF=3, because we can afford one less > response) > {code:java} > What happened: > 1. [WriteStage] before the leaving node is removed from tokenmetadata, the > write endpoints are generated ( leaving endpoint is included ) > 2. [GossipStage] the leaving node is removed from tokenmetadata, no more > future write handler will include leaving endpoints > 3. [WriteStage] write handlers sends messages to messaging-service backlog > 4. [GossipStage] messaging-service backlog is cleared, messages are not sent > and connection closed > 5. [WriteStage] write time out > {code} > |patch| > |[3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.0]| > |[3.11|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.11]| > We can avoid it by delaying to destroy messaging connection so that messages > are sent and responded. This patch also avoids reopening already closed > connection on {{MessagingService#convict()}}. > New messaging framework rewrite in {{Trunk}} avoids the issues by not > clearing messaging backlog. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14930) decommission may cause timeout because messaging backlog is cleared
[ https://issues.apache.org/jira/browse/CASSANDRA-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643607#comment-17643607 ] Zhao Yang commented on CASSANDRA-14930: --- [~azotcsit] sorry, I missed the notification. thanks for the feedback > decommission may cause timeout because messaging backlog is cleared > > > Key: CASSANDRA-14930 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14930 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Core >Reporter: Zhao Yang >Assignee: Zhao Yang >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > On a 3-node cluster with RF=2, decommissioning a node may cause quorum write > timeout because messaging backlog to decommissioned node is cleared via > {{Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket()}}. > (Timeout is less likely to happen with RF=3, because we can afford one less > response) > {code:java} > What happened: > 1. [WriteStage] before the leaving node is removed from tokenmetadata, the > write endpoints are generated ( leaving endpoint is included ) > 2. [GossipStage] the leaving node is removed from tokenmetadata, no more > future write handler will include leaving endpoints > 3. [WriteStage] write handlers sends messages to messaging-service backlog > 4. [GossipStage] messaging-service backlog is cleared, messages are not sent > and connection closed > 5. [WriteStage] write time out > {code} > |patch| > |[3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.0]| > |[3.11|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.11]| > We can avoid it by delaying to destroy messaging connection so that messages > are sent and responded. This patch also avoids reopening already closed > connection on {{MessagingService#convict()}}. > New messaging framework rewrite in {{Trunk}} avoids the issues by not > clearing messaging backlog. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14930) decommission may cause timeout because messaging backlog is cleared
[ https://issues.apache.org/jira/browse/CASSANDRA-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17643469#comment-17643469 ] Stefan Miklosovic commented on CASSANDRA-14930: --- hi [~jasonstack], do you plan to address points [~azotcsit] has? If you are not interested in this ticket anymore I ll gladly take over, I just do not want to hijack your ticket so I am asking first. > decommission may cause timeout because messaging backlog is cleared > > > Key: CASSANDRA-14930 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14930 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Core >Reporter: Zhao Yang >Assignee: Zhao Yang >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > On a 3-node cluster with RF=2, decommissioning a node may cause quorum write > timeout because messaging backlog to decommissioned node is cleared via > {{Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket()}}. > (Timeout is less likely to happen with RF=3, because we can afford one less > response) > {code:java} > What happened: > 1. [WriteStage] before the leaving node is removed from tokenmetadata, the > write endpoints are generated ( leaving endpoint is included ) > 2. [GossipStage] the leaving node is removed from tokenmetadata, no more > future write handler will include leaving endpoints > 3. [WriteStage] write handlers sends messages to messaging-service backlog > 4. [GossipStage] messaging-service backlog is cleared, messages are not sent > and connection closed > 5. [WriteStage] write time out > {code} > |patch| > |[3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.0]| > |[3.11|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.11]| > We can avoid it by delaying to destroy messaging connection so that messages > are sent and responded. This patch also avoids reopening already closed > connection on {{MessagingService#convict()}}. > New messaging framework rewrite in {{Trunk}} avoids the issues by not > clearing messaging backlog. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14930) decommission may cause timeout because messaging backlog is cleared
[ https://issues.apache.org/jira/browse/CASSANDRA-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417719#comment-17417719 ] Aleksei Zotov commented on CASSANDRA-14930: --- Sorry guys, I checked the wrong commit! So just deleted my previous comment to prevent any confusion. However, I have some comments to the actual PR: # Looks like a new property ({{cassandra.messaging_destroy_delay_in_ms}}) is going to be introduced. I think it needs to be described in {{cassandra.yaml}} and other documentation. Moreover, as far as I understand work with properties needs to happen through the {{DatabaseDescriptor}} , not just system properties. # {code:java} if (delay <= 0) // opt out{code} Is it really possible? If yes, could you please describe a scenario? We either take a max of positive values or read it from the property (that should have corresponding validation of being non-negative or positive). # Here I do not really know the flow well, but just to confirm: the node should *not* be among the unreachable endpoint to close the connection? {code:java} if (!liveEndpoints.contains(endpoint) && !unreachableEndpoints.containsKey(endpoint)){code} # Should we have corresponding tests introduced? Taking into account that it is just a bug fix for old versions probably #1 and #4 might be not really critical (but I'd still remove the property unless there is a justification of having it). #2 also does not seem to affect something even though might be not required. > decommission may cause timeout because messaging backlog is cleared > > > Key: CASSANDRA-14930 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14930 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Core >Reporter: Zhao Yang >Assignee: Zhao Yang >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > On a 3-node cluster with RF=2, decommissioning a node may cause quorum write > timeout because messaging backlog to decommissioned node is cleared via > {{Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket()}}. > (Timeout is less likely to happen with RF=3, because we can afford one less > response) > {code:java} > What happened: > 1. [WriteStage] before the leaving node is removed from tokenmetadata, the > write endpoints are generated ( leaving endpoint is included ) > 2. [GossipStage] the leaving node is removed from tokenmetadata, no more > future write handler will include leaving endpoints > 3. [WriteStage] write handlers sends messages to messaging-service backlog > 4. [GossipStage] messaging-service backlog is cleared, messages are not sent > and connection closed > 5. [WriteStage] write time out > {code} > |patch| > |[3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.0]| > |[3.11|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.11]| > We can avoid it by delaying to destroy messaging connection so that messages > are sent and responded. This patch also avoids reopening already closed > connection on {{MessagingService#convict()}}. > New messaging framework rewrite in {{Trunk}} avoids the issues by not > clearing messaging backlog. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14930) decommission may cause timeout because messaging backlog is cleared
[ https://issues.apache.org/jira/browse/CASSANDRA-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17417702#comment-17417702 ] Aleksei Zotov commented on CASSANDRA-14930: --- [~brandon.williams] I feel confused. I see the proposed changes already merged as a part of CASSANDRA-14616 ticket. Here are the existing commits from the repo: 3.0: [https://github.com/apache/cassandra/commit/bbf7dac87cdc41bf8e138a99f630e7a827ad0d98] 3.11: [https://github.com/apache/cassandra/commit/4dd7faa75210f635af36c0852e9b0d2e8bdbb95c] Also the changes referred in the description seem to be simply taken from CASSANDRA-14616 ticket: {quote}Fix cassandra-stress write hang with default options (*CASSANDRA-14616*) {quote} Moreover, they do not seem to be related to the description. I feel the wrong code was accidentally pushed. [~jasonstack] could you please double check the branches mentioned in the description and confirm they have the code you want to merge as a part of this ticket. > decommission may cause timeout because messaging backlog is cleared > > > Key: CASSANDRA-14930 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14930 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Core >Reporter: Zhao Yang >Assignee: Zhao Yang >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > On a 3-node cluster with RF=2, decommissioning a node may cause quorum write > timeout because messaging backlog to decommissioned node is cleared via > {{Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket()}}. > (Timeout is less likely to happen with RF=3, because we can afford one less > response) > {code:java} > What happened: > 1. [WriteStage] before the leaving node is removed from tokenmetadata, the > write endpoints are generated ( leaving endpoint is included ) > 2. [GossipStage] the leaving node is removed from tokenmetadata, no more > future write handler will include leaving endpoints > 3. [WriteStage] write handlers sends messages to messaging-service backlog > 4. [GossipStage] messaging-service backlog is cleared, messages are not sent > and connection closed > 5. [WriteStage] write time out > {code} > |patch| > |[3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.0]| > |[3.11|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.11]| > We can avoid it by delaying to destroy messaging connection so that messages > are sent and responded. This patch also avoids reopening already closed > connection on {{MessagingService#convict()}}. > New messaging framework rewrite in {{Trunk}} avoids the issues by not > clearing messaging backlog. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14930) decommission may cause timeout because messaging backlog is cleared
[ https://issues.apache.org/jira/browse/CASSANDRA-14930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370869#comment-17370869 ] Brandon Williams commented on CASSANDRA-14930: -- ||Jenkins|| |[3.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch/879/]| |[3.11|https://ci-cassandra.apache.org/job/Cassandra-devbranch/880/]| Failures for 3.0 look unrelated, same for 3.11 most of which are known from CASSANDRA-16770. +1 > decommission may cause timeout because messaging backlog is cleared > > > Key: CASSANDRA-14930 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14930 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Core >Reporter: Zhao Yang >Assignee: Zhao Yang >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > On a 3-node cluster with RF=2, decommissioning a node may cause quorum write > timeout because messaging backlog to decommissioned node is cleared via > {{Gossiper#removeEndpoint() -> OutboundTcpConnection#closeSocket()}}. > (Timeout is less likely to happen with RF=3, because we can afford one less > response) > {code:java} > What happened: > 1. [WriteStage] before the leaving node is removed from tokenmetadata, the > write endpoints are generated ( leaving endpoint is included ) > 2. [GossipStage] the leaving node is removed from tokenmetadata, no more > future write handler will include leaving endpoints > 3. [WriteStage] write handlers sends messages to messaging-service backlog > 4. [GossipStage] messaging-service backlog is cleared, messages are not sent > and connection closed > 5. [WriteStage] write time out > {code} > |patch| > |[3.0|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.0]| > |[3.11|https://github.com/jasonstack/cassandra/commits/decommission_timeout_3.11]| > We can avoid it by delaying to destroy messaging connection so that messages > are sent and responded. This patch also avoids reopening already closed > connection on {{MessagingService#convict()}}. > New messaging framework rewrite in {{Trunk}} avoids the issues by not > clearing messaging backlog. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org