rhuffy opened a new pull request, #3270: URL: https://github.com/apache/cassandra/pull/3270
This implements the suggestion by @driftx in [CASSANDRA-15439](https://issues.apache.org/jira/browse/CASSANDRA-15439). See https://the-asf.slack.com/archives/CK23JSY2K/p1713820180414349 for further discussion. If a bootstrapping node experiences a GC pause such that it fails to gossip for 30s, it may be removed from another node's Gossip state, as it has exceeded the FatClient timeout, which by default is equal to RING_DELAY=30s. This violates Cassandra's consistency guarantees. When there are pending ranges, the number of nodes that must ACK a write at quorum is increased. For example, with RF=3, quorum is 2. If there are pending ranges, quorum is 3. For operators who are particularly concerned with the durability of writes during expansion, this option can be set to a higher value - ideally longer than the longest expected GC pause. If this value is increased significantly, like to multiple hours, or set to -1, operators will need to take manual action when a bootstrap has failed, like assassinating the failed node. Example usage: ``` -Dcassandra.bootstrapping_fat_client_timeout_ms=300000 ``` The [Cassandra Jira](https://issues.apache.org/jira/projects/CASSANDRA/issues/) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

