Re: unsubscribe

2022-08-04 Thread Bowen Song via user
Please send an email to "user-unsubscr...@cassandra.apache.org" to 
unsubscribe from this mailing list.


On 04/08/2022 18:29, Dathan Vance Pattishall wrote:

unsubscribe

unsubscribe

2022-08-04 Thread Dathan Vance Pattishall
unsubscribe


RE: Service shutdown

2022-08-04 Thread Marc Hoppins
The only messages in OS system log were exactly the same as daemon.log.  The 
hosts did not shut down, only the Cassandra service stopped. So dmesg has 
nothing.

The amount of data being written is not that great and GC times are always <1s.

The only visible error-type messages are related to source systems being 
unavailable from time to time but there as nothing of this around the time that 
the service stopped I discounted these as a possible cause.

I’ll just keep an eye out if it happens again.

From: Bowen Song via user 
Sent: Thursday, August 4, 2022 12:20 PM
To: user@cassandra.apache.org
Subject: Re: Service shutdown

EXTERNAL

Generally speaking, I've seen Cassandra process stopping for the following 
reasons:

OOM killer
JVM OOM
Received a signal, such as SIGTERM and SIGKILL
File IO error when disk_failure_policy or commit_failure_policy is set to die
Hardware issues, such as memory corruption, causing Cassandra to crash
Reaching ulimit resource limits, such as "too many open files"

They all leave traces behind. You said you've checked OS logs, and you posted 
only the systemd logs from DAEMON.LOG. Have you checked "dmesg" output? Some 
system logs, such as OOM killer and MCE error logs, don't go into the 
DAEMON.LOG file.


On 04/08/2022 11:00, Marc Hoppins wrote:

Hulloa all,



Service on two nodes stopped yesterday and I can find nothing to indicate why.  
I have checked Cassandra system.logs, gc.logs and debug.logs as well as OS logs 
and all I can see is the following - which is far from helpful:



DAEMON.LOG

Aug  3 11:39:12 cassandra19 systemd[1]: cassandra.service: Main process exited, 
code=exited, status=1/FAILURE

Aug  3 11:39:12 cassandra19 systemd[1]: cassandra.service: Failed with result 
'exit-code'.



Aug  3 13:44:52 cassandra23 systemd[1]: cassandra.service: Main process exited, 
code=exited, status=1/FAILURE

Aug  3 13:44:52 cassandra23 systemd[1]: cassandra.service: Failed with result 
'exit-code'.



Initially I thought that the reason the second node went down was because it 
had problems communicating with the other stopped node but with a gap of 2 
hours it seems unlikely.  If this occurs on any of these two nodes again I will 
probably increase logging level but to do so for every node in the hope that I 
pick something up is impractical.



In the meantime, is there anything else I can look at which may deliver unto us 
more info?



Marc


Re: Service shutdown

2022-08-04 Thread Bowen Song via user
Generally speaking, I've seen Cassandra process stopping for the 
following reasons:


   OOM killer
   JVM OOM
   Received a signal, such as SIGTERM and SIGKILL
   File IO error when disk_failure_policy or commit_failure_policy is
   set to die
   Hardware issues, such as memory corruption, causing Cassandra to crash
   Reaching ulimit resource limits, such as "too many open files"

They all leave traces behind. You said you've checked OS logs, and you 
posted only the systemd logs from DAEMON.LOG. Have you checked "dmesg" 
output? Some system logs, such as OOM killer and MCE error logs, don't 
go into the DAEMON.LOG file.



On 04/08/2022 11:00, Marc Hoppins wrote:

Hulloa all,

Service on two nodes stopped yesterday and I can find nothing to indicate why.  
I have checked Cassandra system.logs, gc.logs and debug.logs as well as OS logs 
and all I can see is the following - which is far from helpful:

DAEMON.LOG
Aug  3 11:39:12 cassandra19 systemd[1]: cassandra.service: Main process exited, 
code=exited, status=1/FAILURE
Aug  3 11:39:12 cassandra19 systemd[1]: cassandra.service: Failed with result 
'exit-code'.

Aug  3 13:44:52 cassandra23 systemd[1]: cassandra.service: Main process exited, 
code=exited, status=1/FAILURE
Aug  3 13:44:52 cassandra23 systemd[1]: cassandra.service: Failed with result 
'exit-code'.

Initially I thought that the reason the second node went down was because it 
had problems communicating with the other stopped node but with a gap of 2 
hours it seems unlikely.  If this occurs on any of these two nodes again I will 
probably increase logging level but to do so for every node in the hope that I 
pick something up is impractical.

In the meantime, is there anything else I can look at which may deliver unto us 
more info?

Marc

Service shutdown

2022-08-04 Thread Marc Hoppins
Hulloa all,

Service on two nodes stopped yesterday and I can find nothing to indicate why.  
I have checked Cassandra system.logs, gc.logs and debug.logs as well as OS logs 
and all I can see is the following - which is far from helpful:

DAEMON.LOG
Aug  3 11:39:12 cassandra19 systemd[1]: cassandra.service: Main process exited, 
code=exited, status=1/FAILURE
Aug  3 11:39:12 cassandra19 systemd[1]: cassandra.service: Failed with result 
'exit-code'.

Aug  3 13:44:52 cassandra23 systemd[1]: cassandra.service: Main process exited, 
code=exited, status=1/FAILURE
Aug  3 13:44:52 cassandra23 systemd[1]: cassandra.service: Failed with result 
'exit-code'.

Initially I thought that the reason the second node went down was because it 
had problems communicating with the other stopped node but with a gap of 2 
hours it seems unlikely.  If this occurs on any of these two nodes again I will 
probably increase logging level but to do so for every node in the hope that I 
pick something up is impractical.

In the meantime, is there anything else I can look at which may deliver unto us 
more info?

Marc