Marcus Olsson created CASSANDRA-13886:
-----------------------------------------

             Summary: OOM put node in limbo
                 Key: CASSANDRA-13886
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13886
             Project: Cassandra
          Issue Type: Bug
         Environment: Cassandra version 2.2.10
            Reporter: Marcus Olsson
            Priority: Minor


In one of our test clusters we have had some issues with OOM. While working on 
fixing this it was discovered that one of the nodes that got OOM actually 
wasn't shut down properly. Instead it went into a half-up-state where the 
affected node considered itself up while all other nodes considered it as down.

The following stacktrace was observed which seems to be the cause of this:
{noformat}
java.lang.NoClassDefFoundError: Could not initialize class java.lang.UNIXProcess
        at java.lang.ProcessImpl.start(ProcessImpl.java:130) ~[na:1.8.0_131]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) 
~[na:1.8.0_131]
        at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_131]
        at java.lang.Runtime.exec(Runtime.java:485) ~[na:1.8.0_131]
        at 
org.apache.cassandra.utils.HeapUtils.generateHeapDump(HeapUtils.java:88) 
~[apache-cassandra-2.2.10.jar:2.2.10]
        at 
org.apache.cassandra.utils.JVMStabilityInspector.inspectThrowable(JVMStabilityInspector.java:56)
 ~[apache-cassandra-2.2.10.jar:2.2.10]
        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:168)
 ~[apache-cassandra-2.2.10.jar:2.2.10]
        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
 ~[apache-cassandra-2.2.10.jar:2.2.10]
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
~[apache-cassandra-2.2.10.jar:2.2.10]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
{noformat}

It seems that if an unexpected exception/error is thrown inside 
JVMStabilityInspector.inspectThrowable the JVM is not actually shut down but 
instead keeps on running. My expectation is that the JVM should shut down in 
case OOM is thrown.

Potential workaround is to add:
{noformat}
JVM_OPTS="$JVM_OPTS -XX:+ExitOnOutOfMemoryError"
{noformat}
to cassandra-env.sh.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to