I have node in cluster when I start c, the cpu reaches 100% with java process on top. Within a few minutes, jvm crashes (jvm instability) messages in system.log and c* crashes.
Once c* is up, cluster average read latency reaches multi-seconds and client apps are unhappy. For now, the only way out is to drain the node and let the cluster latency settle. None of these measures helped ... 1. Rebooting the ec2 2. Replacing the ec2 altogether (new ec2/ new c* install/ etc). 3. Stopping compactions (as a diagnostic measure) Trying to understand why the java process is chewing much cpu i.e. what is actually happening ... I see these error messages in the debug.log. What functional task do these messages relate to e.g. compactions? DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766 AbstractQueryPager.java:95 - Fetched 1 live rows DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766 AbstractQueryPager.java:112 - Got result (1) smaller than page size (5000), considering pager exhausted INFO [Service Thread] 2021-06-30 13:39:04,766 StatusLogger.java:56 - MemtablePostFlush 0 0 29 0 0 DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766 AbstractQueryPager.java:133 - Remaining rows to page: 2147483646 DEBUG [SharedPool-Worker-113] 2021-06-30 13:39:04,766 SliceQueryPager.java:92 - Querying next page of slice query; new filter: SliceQueryFilter [reversed=false, slices=[[, ]], count=5000, toGroup = 0] INFO [Service Thread] 2021-06-30 13:39:04,766 StatusLogger.java:56 - ValidationExecutor 0 0 0 0 0 INFO [Service Thread] 2021-06-30 13:39:04,766 StatusLogger.java:56 - Sampler 0 0 0 0 0 INFO [Service Thread] 2021-06-30 13:39:04,767 StatusLogger.java:56 - MemtableFlushWriter 0 0 6 0 0 INFO [Service Thread] 2021-06-30 13:39:04,767 StatusLogger.java:56 - InternalResponseStage 0 0 4 0 0 DEBUG [SharedPool-Worker-131] 2021-06-30 13:39:05,078 StorageProxy.java:1467 - Read timeout; received 1 of 2 responses (only digests) DEBUG [SharedPool-Worker-131] 2021-06-30 13:39:05,079 SliceQueryPager.java:92 - Querying next page of slice query; new filter: SliceQueryFilter [reversed=false, slices=[[, ]], count=5000, toGroup = 0] DEBUG [SharedPool-Worker-158] 2021-06-30 13:39:05,079 StorageProxy.java:1467 - Read timeout; received 1 of 2 responses (only digests) DEBUG [SharedPool-Worker-158] 2021-06-30 13:39:05,079 SliceQueryPager.java:92 - Querying next page of slice query; new filter: SliceQueryFilter [reversed=false, slices=[[, ]], count=5000, toGroup = 0] DEBUG [SharedPool-Worker-90] 2021-06-30 13:39:05,080 StorageProxy.jav .... EBUG [SharedPool-Worker-26] 2021-06-30 13:39:01,842 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5069-big-Data.db DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,847 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5592-big-Data.db DEBUG [SharedPool-Worker-5] 2021-06-30 13:39:01,849 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-3993-big-Data.db DEBUG [SharedPool-Worker-5] 2021-06-30 13:39:01,849 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5927-big-Data.db DEBUG [SharedPool-Worker-5] 2021-06-30 13:39:01,849 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-1276-big-Data.db DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5949-big-Data.db DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-865-big-Data.db DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-5741-big-Data.db DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-4098-big-Data.db DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-1662-big-Data.db DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-1339-big-Data.db DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,854 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-4598-big-Data.db DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,855 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-3676-big-Data.db DEBUG [SharedPool-Worker-12] 2021-06-30 13:39:01,855 FileCacheService.java:102 - Evicting cold readers for /data/cassandra/mykeyspace/mytable-cf0c43b028e811e68f2b1b695a8d5b2c/lb-2814-big-Data.db DEBUG [SharedPool-Worker-12] 2021- We are using c* 2.2.8 ---------------------------------------- Thank you