Hi guys,

We have a 6 node Cassandra Cluster under heavy utilization. We have been
dealing a lot with garbage collector stop the world event, which can take
up to 50 seconds in our nodes, in the meantime Cassandra Node is
unresponsive, not even accepting new logins.

Extra details:

   - Cassandra Version: 3.11
   - Heap Size = 12 GB
   - We are using G1 Garbage Collector with default settings
   - Nodes size: 4 CPUs 28 GB RAM
   - All CPU cores are at 100% all the time.
   - The G1 GC behavior is the same across all nodes.

The behavior remains basically:

   1. Old Gen starts to fill up.
   2. GC can't clean it properly without a full GC and a STW event.
   3. The full GC starts to take longer, until the node is completely
   unresponsive.

*Extra details and GC reports:*
https://stackoverflow.com/questions/46568777/cassandra-and-g1-garbage-collector-stop-the-world-event-stw

Can someone point me what configurations or events I could check?

Thanks!

Best regards,

Reply via email to