Hi, We have a cluster where if reads are increased 2-3 times suddenly then cassandra cpu goes around 100% (We have 48 cpu machines with 128GB RAM) for few nodes and cassandra becomes unresponsive . We are on 3.11.5 and using G1GC with 16GB heap size. When going through the system.logs and gc.log , i see in system.log it is just printing messages like below every 5 secs. I have removed lines for many keyspaces to reduce the size of the text. , and lot of messages are getting printed in gc.log . I feel that may be i need to increase heap size on these nodes but i wanted to understand , how do we determine if heap size should be increased or not. Nodes are not dying due to OOMs . When we have OOMs , we know for sure we need to increase heap size but *what to see in gc.log , system.log and debug.log to determine if we have to increase heap size.*
INFO [ScheduledTasks:1] 2020-08-19 08:13:12,368 MessagingService.java:1246 - READ messages were dropped in last 5000 ms: 199 internal and 232 cross node. Mean internal dropped latency: 10443 ms and Mean cross-node dropped latency: 10402 ms INFO [ScheduledTasks:1] 2020-08-19 08:13:12,369 StatusLogger.java:47 - Pool Name Active Pending Completed Blocked All Time Blocked INFO [ScheduledTasks:1] 2020-08-19 08:13:12,377 StatusLogger.java:51 - MutationStage 0 0 80051890 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 - ViewMutationStage 0 0 0 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 - ReadStage 192 1331 152624049 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 - RequestResponseStage 0 0 172822890 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,378 StatusLogger.java:51 - ReadRepairStage 0 0 1545869 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 - CounterMutationStage 0 0 0 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 - MiscStage 0 0 0 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 - CompactionExecutor 0 0 623536 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,379 StatusLogger.java:51 - MemtableReclaimMemory 0 0 6700 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 - PendingRangeCalculator 0 0 18 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 - GossipStage 0 0 1613366 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 - SecondaryIndexManagement 0 0 0 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,380 StatusLogger.java:51 - HintsDispatcher 0 0 5 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 - MigrationStage 0 0 1 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 - MemtablePostFlush 0 0 14830 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 - PerDiskMemtableFlushWriter_0 0 0 6700 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,381 StatusLogger.java:51 - ValidationExecutor 0 0 0 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 - Sampler 0 0 0 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 - MemtableFlushWriter 0 0 6700 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,382 StatusLogger.java:51 - InternalResponseStage 0 0 33229 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 - AntiEntropyStage 0 0 0 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 - CacheCleanupExecutor 0 0 0 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:51 - Native-Transport-Requests 661 0 84577742 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,383 StatusLogger.java:61 - CompactionManager 0 0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:73 - MessagingService n/a 0/0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:83 - Cache Type Size Capacity KeysToSave INFO [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:85 - KeyCache 104857576 104857600 all INFO [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:91 - RowCache 0 0 all INFO [ScheduledTasks:1] 2020-08-19 08:13:12,384 StatusLogger.java:98 - Table Memtable ops,data INFO [ScheduledTasks:1] 2020-08-19 08:13:12,429 StatusLogger.java:101 - system_distributed.parent_repair_history 0,0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,429 StatusLogger.java:101 - system_distributed.repair_history 0,0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 - system_distributed.view_build_status 0,0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 - system.compaction_history 12,3327 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 - system.schema_aggregates 0,0 INFO [ScheduledTasks:1] 2020-08-19 08:13:12,430 StatusLogger.java:101 - system.schema_triggers 0,0 Thanks