rhuffy opened a new pull request, #3270:
URL: https://github.com/apache/cassandra/pull/3270

   This implements the suggestion by @driftx in 
[CASSANDRA-15439](https://issues.apache.org/jira/browse/CASSANDRA-15439). See 
https://the-asf.slack.com/archives/CK23JSY2K/p1713820180414349 for further 
discussion.
   
   If a bootstrapping node experiences a GC pause such that it fails to gossip 
for 30s, it may be removed from another node's Gossip state, as it has exceeded 
the FatClient timeout, which by default is equal to RING_DELAY=30s.
   
   This violates Cassandra's consistency guarantees. When there are pending 
ranges, the number of nodes that must ACK a write at quorum is increased. For 
example, with RF=3, quorum is 2. If there are pending ranges, quorum is 3.
   
   For operators who are particularly concerned with the durability of writes 
during expansion, this option can be set to a higher value - ideally longer 
than the longest expected GC pause. If this value is increased significantly, 
like to multiple hours, or set to -1, operators will need to take manual action 
when a bootstrap has failed, like assassinating the failed node.
   
   Example usage:
   ```
   -Dcassandra.bootstrapping_fat_client_timeout_ms=300000
   ```
   
   The [Cassandra 
Jira](https://issues.apache.org/jira/projects/CASSANDRA/issues/)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to