subject:"\[jira\] \[Commented\] \(CASSANDRA\-16182\) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers"

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-07 Thread Sumanth Pasupuleti (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209772#comment-17209772
 ] 

Sumanth Pasupuleti commented on CASSANDRA-16182:


{quote}C' would not complete replacement if it detects C presence before 
then.{quote}
I agree it will be nice if C' would halt its replacement, however, this depends 
on the timing/chance that C' should hear about C before it completes it 
replacement.

{quote}nodes should apply the C' state once C is taken offline, after 
confirming it is still valid{quote}
There would be a window of time between C' announcing itself to be available 
and returning inconsistent results for the LOCAL_ONE requests coming directly 
to it and when C is marked DOWN by other peers. But +1 to this approach since 
this would address the issue deterministically imo.

{quote}But that's the contract accepted at ONE anyway{quote}
I agree with this w.r.t. legit data inconsistency that would eventually be 
fixed either through hints / repairs. However, in this scenario, until the 
operator takes an action, there is no bound on the potential inconsistency.



> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-06 Thread Paulo Motta (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209174#comment-17209174
 ] 

Paulo Motta commented on CASSANDRA-16182:
-

> This doesn't really work if any other node has already replaced C with C'.

C' would not complete replacement if it detects C presence before then.

This wouldn't prevent C from reappearing to other nodes *after* replacement is 
completed, but would at least prevent the reported scenario.

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-06 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209167#comment-17209167
 ] 

Benedict Elliott Smith commented on CASSANDRA-16182:


{quote}I think the safest thing to prevent this edge case is to make C' abort 
replacement if it hears about C via gossip.
{quote}
This doesn't really work if any other node has already replaced C with C'.
{quote}this is truly edge case and bad timing
{quote}
I agree it is a rare scenario, and an operator should be able to rectify it - 
even if it is a potentially serious event. My personal preference is to shelve 
this until we overhaul cluster membership, hopefully for 5.0.

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-06 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209136#comment-17209136
 ] 

Brandon Williams commented on CASSANDRA-16182:
--

That sounds reasonable (if it hears about it and it changes its liveness.)  I 
will just point out, there was 30s of this before the replacement started, so 
this is truly edge case and bad timing.

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-06 Thread Paulo Motta (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17209120#comment-17209120
 ] 

Paulo Motta commented on CASSANDRA-16182:
-

I think the safest thing to prevent this edge case is to make C' abort 
replacement if it hears about the C via gossip. Likewise if node C learns about 
C' via gossip it should probably halt execution to prevent potential 
consistency violations.

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Benedict Elliott Smith (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208394#comment-17208394
 ] 

Benedict Elliott Smith commented on CASSANDRA-16182:


I agree with Brandon that the situation occurring at all is bad, i.e. the 
operator should not attempt to replace C until it has isolated it from the 
cluster. As ownership works today, this split brain introduces a risk of 
consistency violations.

That said, I'm unconvinced there is much to be gained from refusing to process 
the replacement by A and B. C' has already unilaterally announced its status as 
the new owner, and this will eventually come to pass in some places in the 
cluster. Even if C was never intended to be replaced, at this point the cluster 
will enter a split brain until C is taken down or C' is assassinated, since 
other nodes presumably also were unaware of C being alive else C' would not 
witness it as down. If instead C's self-nomination wins on other nodes, the 
situation resolves eventually without operator input. Either approach 
introduces potential consistency violations, but the sooner the inconsistency 
resolves the smaller the window for problems.

However, even if this isn't preferred, at the very least nodes should apply the 
C' state once C is taken offline, after confirming it is still valid.

 

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208375#comment-17208375
 ] 

Brandon Williams commented on CASSANDRA-16182:
--

bq.  I kind of agree that this seems hacky to increment generation # for this 
purpose,

Definitely.

bq. Given that this node C' makes itself available for reads worries me of the 
consequences

Only if clients explicitly connect to it (they won't be notified about it) and 
read at [LOCAL_]ONE.  But that's the contract accepted at ONE anyway.

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208370#comment-17208370
 ] 

Sumanth Pasupuleti commented on CASSANDRA-16182:


Thanks for the clarification [~brandon.williams]. I should have been clearer; I 
meant increasing the generation # to "force update" the already communicated 
state. I kind of agree that this seems hacky to increment generation # for this 
purpose, but I was also thinking this is a potentially rare/isolated scenario 
and given that this scenario can be detected deterministically (by hearing 
about a node that owns the same token through gossip as yourself, and a node 
that you cannot reach). 
Given that this node C' makes itself available for reads worries me of the 
consequences, and makes me think we should attempt to make the cluster self 
heal from this situation.



> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208278#comment-17208278
 ] 

Brandon Williams commented on CASSANDRA-16182:
--

bq. Aliveness would have to be determined by C' itself, vs via A or B

Determined by that node itself, but it can learn about newer heartbeats from C 
via A or B, it doesn't need direct communication with C for that.

bq. Curious to know your thoughts on the proposed fix

Re-emitting our state is a hack, and, actually, won't matter here since it's 
already been sent and processed; there is no change if we set our state again.  
On restart what is happening is a newer generation from the restart is causing 
our state to be processed as new.

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated by cloud provider" due to health check failure and a 
> replacement node C' got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208249#comment-17208249
 ] 

Sumanth Pasupuleti commented on CASSANDRA-16182:


{quote}Then it was dead to C', or the replace would've failed on that 
node{quote}

+1

{quote} That's interesting since it should have seen C alive via A or B since 
it could talk to them {quote}
Aliveness would have to be determined by C' itself, vs via A or B isn't it (my 
understanding is, Gossip would help discover cluster members but Aliveness will 
be determined by each individual nodes' FailureDetector). My hypothesis is, C 
was good enough to still hold the connections it had (to A and B), but was bad 
enough not to be able to establish new connections from newer nodes like C'.

{quote} I'm not sure what else we can do {quote}
Curious to know your thoughts on the proposed fix

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208240#comment-17208240
 ] 

Brandon Williams commented on CASSANDRA-16182:
--

bq. I believe the same (that C was still alive)

Then it was dead to C', or the replace would've failed on that node.  That's 
interesting since it should have seen C alive via A or B since it could take to 
them.  So you had a split-brain cluster you were doing a topology change on, 
which is generally ok though not ideal, but an unexpected healing of the 
partition during the operation might produce some weird results. I'm not sure 
what else we can do, but the important thing is the cluster handled it 
deterministically

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Sumanth Pasupuleti (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208233#comment-17208233
 ] 

Sumanth Pasupuleti commented on CASSANDRA-16182:


Yes Brandon. I believe the same (that C was still alive). C was alive long 
enough to surpass C' bootstrap completion w.r.t. timeline. As mentioned, a few 
seconds later, C could not communicate anymore and A and B failure detector 
marked it DOWN, after which, had C' (re)announced its NORMAL state, C would 
have been accepted to join the ring by the peers.

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

2020-10-05 Thread Brandon Williams (Jira)



[ 
https://issues.apache.org/jira/browse/CASSANDRA-16182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208213#comment-17208213
 ] 

Brandon Williams commented on CASSANDRA-16182:
--

bq. Peer nodes A and B logged "Node C' cannot complete replacement of alive 
node C "

This means either C was still alive, or there was newer gossip information 
somewhere in the cluster that these nodes had not previously seen, since they 
believed C to be alive, which is the crux of the problem here.

> A replacement node, although completed bootstrap and joined ring according to 
> itself, stuck in Joining state as per the peers
> -
>
> Key: CASSANDRA-16182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16182
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Sumanth Pasupuleti
>Assignee: Sumanth Pasupuleti
>Priority: Normal
> Fix For: 3.0.x
>
>
> This issue occurred in a production 3.0.21 cluster.
> Here is what happened
> # We had, say, a three node Cassandra cluster with nodes A, B and C
> # C got "terminated" due to health check failure and a replacement node C' 
> got launched.
> # C' started bootstrapping data from its neighbors
> # Network flaw: Nodes A,B were still able to communicate with terminated node 
> C and consequently still have C as alive.
> # The replacement node C' learnt about C through gossip but was unable to 
> communicate with C and marked C as DOWN.
> # C' completed bootstrapping successfully and itself and its peers logged 
> this statement "Node C' will complete replacement of C for tokens 
> [-7686143363672898397]"
> # C' logged the statement "Nodes C' and C have the same token 
> -7686143363672898397. C' is the new owner"
> # C' started listening for thrift and cql clients
> # Peer nodes A and B logged "Node C' cannot complete replacement of alive 
> node C "
> # A few seconds later, A and B marked C as DOWN
> C' continued to log below lines in an endless fashion
> {code:java}
> Node C is now part of the cluster
> Nodes () and C' have the same token C.  Ignoring -7686143363672898397 (Needs 
> a log statement fix)
> FatClient C has been silent for 3ms, removing from gossip
> {code}
> My reasoning of what happened: 
> By the time replacement node (C') finished bootstrapping and announced it's 
> state to Normal, A and B were still able to communicate with the replacing 
> node C (while C' was not able to with C), and hence rejected C' replacing C. 
> C' does not know this and does not attempt to recommunicate its "Normal" 
> state to rest of the cluster. (Worth noting that A and B marked C as down 
> soon after)
> Gossip keeps telling C' to add C to its metadata, and C' keeps kicking C out 
> eventually based on FailureDetector. 
> Proposed fix:
> When C' is notified through gossip about C, and given both own the same token 
> and given C' has finished bootstrapping, C' can emit its Normal state again 
> which should fix this in my opinion (so long as A and B have marked C as 
> DOWN, which they did eventually)
> I ended up manually fixing this by restarting Cassandra on C', which forced 
> it to announce its "Normal" state via
> StorageService.initServer --> joinTokenRing() --> finishJoiningRing() --> 
> setTokens() --> setGossipTokens()
> Alternately, I could have possibly achieved the same behavior if I disabled 
> and enabled gossip via jmx/nodetool.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

[jira] [Commented] (CASSANDRA-16182) A replacement node, although completed bootstrap and joined ring according to itself, stuck in Joining state as per the peers

13 matches

Site Navigation

Mail list logo

Footer information