Generally speaking, I've seen Cassandra process stopping for the following reasons:

   OOM killer
   JVM OOM
   Received a signal, such as SIGTERM and SIGKILL
   File IO error when disk_failure_policy or commit_failure_policy is
   set to die
   Hardware issues, such as memory corruption, causing Cassandra to crash
   Reaching ulimit resource limits, such as "too many open files"

They all leave traces behind. You said you've checked OS logs, and you posted only the systemd logs from DAEMON.LOG. Have you checked "dmesg" output? Some system logs, such as OOM killer and MCE error logs, don't go into the DAEMON.LOG file.


On 04/08/2022 11:00, Marc Hoppins wrote:
Hulloa all,

Service on two nodes stopped yesterday and I can find nothing to indicate why.  
I have checked Cassandra system.logs, gc.logs and debug.logs as well as OS logs 
and all I can see is the following - which is far from helpful:

DAEMON.LOG
Aug  3 11:39:12 cassandra19 systemd[1]: cassandra.service: Main process exited, 
code=exited, status=1/FAILURE
Aug  3 11:39:12 cassandra19 systemd[1]: cassandra.service: Failed with result 
'exit-code'.

Aug  3 13:44:52 cassandra23 systemd[1]: cassandra.service: Main process exited, 
code=exited, status=1/FAILURE
Aug  3 13:44:52 cassandra23 systemd[1]: cassandra.service: Failed with result 
'exit-code'.

Initially I thought that the reason the second node went down was because it 
had problems communicating with the other stopped node but with a gap of 2 
hours it seems unlikely.  If this occurs on any of these two nodes again I will 
probably increase logging level but to do so for every node in the hope that I 
pick something up is impractical.

In the meantime, is there anything else I can look at which may deliver unto us 
more info?

Marc

Reply via email to