[jira] [Comment Edited] (CASSANDRA-13931) Cassandra JVM stop itself randomly

2017-10-10 Thread Andrey Lataev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198812#comment-16198812
 ] 

Andrey Lataev edited comment on CASSANDRA-13931 at 10/10/17 3:17 PM:
-

I am downgrade Cassndra til 3.10
Upgrade JDK til 1.8.0_144
And set
{code:java}
MAX_HEAP_SIZE="9G"
{code}
and do not change 
{code:java}
JVM_OPTS="$JVM_OPTS -XX:MaxDirectMemorySize=24G"
{code}
But still periodicaly have a similar problem with off-heap:

{code:java}
#egrep "Dumping|YamlConfigurationLoader.java|ERR" /var/log/cassandra/system.log 
| egrep "2017-10-10 15"
ERROR [NonPeriodicTasks:1] 2017-10-10 15:59:31,155 Ref.java:233 - Error when 
closing class 
org.apache.cassandra.io.sstable.format.SSTableReader$GlobalTidy@954667024:/egov/data/cassandra/datafiles1/p00smevaudit/messagelog20171010-a50f6b00a1f511e78dc897891b876cc2/mc-4357-big
ERROR [NonPeriodicTasks:1] 2017-10-10 15:59:32,103 Ref.java:233 - Error when 
closing class 
org.apache.cassandra.io.sstable.format.SSTableReader$GlobalTidy@1640091777:/egov/data/cassandra/datafiles1/p00smevaudit/messagelog20171010-a50f6b00a1f511e78dc897891b876cc2/mc-4355-big


# egrep "Dumping|YamlConfigurationLoader.java|ERR" 
/var/log/cassandra/system.log | egrep "2017-10-10 16"
ERROR [MessagingService-Incoming-/172.20.4.125] 2017-10-10 16:00:17,421 
CassandraDaemon.java:229 - Exception in thread 
Thread[MessagingService-Incoming-/172.20.4.125,5,main]
INFO  [MutationStage-128] 2017-10-10 16:00:17,690 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-196] 2017-10-10 16:00:17,721 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-18] 2017-10-10 16:00:17,754 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-184] 2017-10-10 16:00:17,757 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-235] 2017-10-10 16:00:17,768 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-197] 2017-10-10 16:00:17,769 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-28] 2017-10-10 16:00:17,780 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-2] 2017-10-10 16:00:17,846 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-152] 2017-10-10 16:00:17,873 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-241] 2017-10-10 16:00:17,876 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-223] 2017-10-10 16:00:21,540 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-16] 2017-10-10 16:00:21,540 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-189] 2017-10-10 16:00:21,540 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
ERROR [MessagingService-Incoming-/172.20.4.139] 2017-10-10 16:00:21,540 
CassandraDaemon.java:229 - Exception in thread 
Thread[MessagingService-Incoming-/172.20.4.139,5,main]
ERROR [MessagingService-Incoming-/172.20.4.145] 2017-10-10 16:00:21,540 
CassandraDaemon.java:229 - Exception in thread 
Thread[MessagingService-Incoming-/172.20.4.145,5,main]
INFO  [MutationStage-224] 2017-10-10 16:00:21,543 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-222] 2017-10-10 16:00:21,545 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-101] 2017-10-10 16:00:21,574 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
INFO  [MutationStage-40] 2017-10-10 16:00:25,095 HeapUtils.java:136 - Dumping 
heap to /egov/dumps/cassandra-1507584313-pid17345.hprof ...
ERROR [MessagingService-Incoming-/172.20.4.145] 2017-10-10 16:00:25,170 
CassandraDaemon.java:229 - Exception in thread 
Thread[MessagingService-Incoming-/172.20.4.145,5,main]
ERROR [MessagingService-Incoming-/172.20.4.109] 2017-10-10 16:00:25,212 
CassandraDaemon.java:229 - Exception in thread 
Thread[MessagingService-Incoming-/172.20.4.109,5,main]
ERROR [MessagingService-Incoming-/172.20.4.163] 2017-10-10 16:00:25,213 
CassandraDaemon.java:229 - Exception in thread 
Thread[MessagingService-Incoming-/172.20.4.163,5,main]
ERROR [MessagingService-Incoming-/172.20.4.162] 2017-10-10 16:00:25,216 
CassandraDaemon.java:229 - Exception in thread 
Thread[MessagingService-Incoming-/172.20.4.162,5,main]
ERROR [MutationStage-128] 2017-10-10 16:00:32,694 

[jira] [Comment Edited] (CASSANDRA-13931) Cassandra JVM stop itself randomly

2017-10-04 Thread Andrey Lataev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191450#comment-16191450
 ] 

Andrey Lataev edited comment on CASSANDRA-13931 at 10/4/17 3:41 PM:


As you can see in attached cassandra-env.sh file
row:
{code:java}
JVM_OPTS="$JVM_OPTS -Djdk.nio.maxCachedBufferSize=262144"
{code}

- exist.
I will try to enlarge RAM and and increase heap size til 16Gb.
Eclipse Memory Analyser for heapdump shown top 3 problem suspect:

*Problem Suspect 1*
{code:java}
The thread org.apache.cassandra.net.OutboundTcpConnection @ 0x6cd263100 
MessagingService-Outgoing-p00skimnosql10.00.egov.local/172.20.4.148-Large keeps 
local variables with total size 306 114 312 (13,97%) bytes.

The memory is accumulated in one instance of 
"org.apache.cassandra.net.OutboundTcpConnection" loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x6c000".
{code}

*Problem Suspect 2*

{code:java}
529 instances of "io.netty.util.concurrent.FastThreadLocalThread", loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x6c000" occupy 776 362 840 (35,43%) 
bytes. 

Biggest instances:

•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d719e1e0 
epollEventLoopGroup-2-7 - 156 689 680 (7,15%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d719e7e0 
epollEventLoopGroup-2-3 - 125 567 112 (5,73%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d719da60 
epollEventLoopGroup-2-12 - 119 599 160 (5,46%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6ceab17b0 
epollEventLoopGroup-2-1 - 118 469 632 (5,41%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d7059b00 ReadStage-151 - 
66 494 040 (3,03%) bytes. 
{code}

*Problem Suspect 3*

{code:java}
126 instances of "byte[]", loaded by "" occupy 268 549 640 
(12,26%) bytes. These instances are referenced from one instance of 
"java.util.HashMap$Node[]", loaded by ""

Keywords
byte[]
java.util.HashMap$Node[]
{code}




was (Author: ljus):
As you can see in attached cassandra-env.sh file
row:
{code:java}
JVM_OPTS="$JVM_OPTS -Djdk.nio.maxCachedBufferSize=262144"
{code}

- exist.
I will try to enlarge RAM and and increase heap size til 16Gb.
Eclipse Memory Analyser for heapdump shown top 3 problem suspect:

*Problem Suspect 1*
{code:java}
The thread org.apache.cassandra.net.OutboundTcpConnection @ 0x6cd263100 
MessagingService-Outgoing-p00skimnosql10.00.egov.local/172.20.4.148-Large keeps 
local variables with total size 306 114 312 (13,97%) bytes.

The memory is accumulated in one instance of 
"org.apache.cassandra.net.OutboundTcpConnection" loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x6c000".
{code}

* Problem Suspect 2*

{code:java}
529 instances of "io.netty.util.concurrent.FastThreadLocalThread", loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x6c000" occupy 776 362 840 (35,43%) 
bytes. 

Biggest instances:

•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d719e1e0 
epollEventLoopGroup-2-7 - 156 689 680 (7,15%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d719e7e0 
epollEventLoopGroup-2-3 - 125 567 112 (5,73%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d719da60 
epollEventLoopGroup-2-12 - 119 599 160 (5,46%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6ceab17b0 
epollEventLoopGroup-2-1 - 118 469 632 (5,41%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d7059b00 ReadStage-151 - 
66 494 040 (3,03%) bytes. 
{code}

*Problem Suspect 3*

{code:java}
126 instances of "byte[]", loaded by "" occupy 268 549 640 
(12,26%) bytes. These instances are referenced from one instance of 
"java.util.HashMap$Node[]", loaded by ""

Keywords
byte[]
java.util.HashMap$Node[]
{code}



> Cassandra JVM stop itself randomly
> --
>
> Key: CASSANDRA-13931
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13931
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: RHEL 7.3
> JDK HotSpot 1.8.0_121-b13
> cassandra-3.11 cluster with 43 nodes in 9 datacenters
> 8vCPU, 32 GB RAM
>Reporter: Andrey Lataev
> Attachments: cassandra-env.sh, cassandra.yaml, 
> system.log.2017-10-01.zip
>
>
> Before I set  -XX:MaxDirectMemorySize  I receive  OOM on OS level like;
> # # grep "Out of" /var/log/messages-20170918
> Sep 16 06:54:07 p00skimnosql04 kernel: Out of memory: Kill process 26619 
> (java) score 287 or sacrifice child
> Sep 16 06:54:07 p00skimnosql04 kernel: Out of memory: Kill process 26640 
> (java) score 289 or sacrifice child
> If set  -XX:MaxDirectMemorySize=5G limitation then periodicaly begin receive:
> HeapUtils.java:136 - Dumping heap to 
> /egov/dumps/cassandra-1506868110-pid11155.hprof
> It seems like  JVM kill itself when off-heap memory leaks occur.
> Typical errors in  system.log before JVM begin dumping:
> ERROR 

[jira] [Comment Edited] (CASSANDRA-13931) Cassandra JVM stop itself randomly

2017-10-04 Thread Andrey Lataev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191450#comment-16191450
 ] 

Andrey Lataev edited comment on CASSANDRA-13931 at 10/4/17 3:40 PM:


As you can see in attached cassandra-env.sh file
row:
{code:java}
JVM_OPTS="$JVM_OPTS -Djdk.nio.maxCachedBufferSize=262144"
{code}

- exist.
I will try to enlarge RAM and and increase heap size til 16Gb.
Eclipse Memory Analyser for heapdump shown top 3 problem suspect:

*Problem Suspect 1*
{code:java}
The thread org.apache.cassandra.net.OutboundTcpConnection @ 0x6cd263100 
MessagingService-Outgoing-p00skimnosql10.00.egov.local/172.20.4.148-Large keeps 
local variables with total size 306 114 312 (13,97%) bytes.

The memory is accumulated in one instance of 
"org.apache.cassandra.net.OutboundTcpConnection" loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x6c000".
{code}

* Problem Suspect 2*

{code:java}
529 instances of "io.netty.util.concurrent.FastThreadLocalThread", loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x6c000" occupy 776 362 840 (35,43%) 
bytes. 

Biggest instances:

•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d719e1e0 
epollEventLoopGroup-2-7 - 156 689 680 (7,15%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d719e7e0 
epollEventLoopGroup-2-3 - 125 567 112 (5,73%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d719da60 
epollEventLoopGroup-2-12 - 119 599 160 (5,46%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6ceab17b0 
epollEventLoopGroup-2-1 - 118 469 632 (5,41%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d7059b00 ReadStage-151 - 
66 494 040 (3,03%) bytes. 
{code}

*Problem Suspect 3*

{code:java}
126 instances of "byte[]", loaded by "" occupy 268 549 640 
(12,26%) bytes. These instances are referenced from one instance of 
"java.util.HashMap$Node[]", loaded by ""

Keywords
byte[]
java.util.HashMap$Node[]
{code}




was (Author: ljus):
As you can see in attached cassandra-env.sh file
row:
{code:java}
JVM_OPTS="$JVM_OPTS -Djdk.nio.maxCachedBufferSize=262144"
{code}

- exist.
I will try to enlarge RAM and and increase heap size til 16Gb.
Eclipse Memory Analyser for heapdump shown top 3 problem suspect:

*Problem Suspect 1*
{code:java}
The thread org.apache.cassandra.net.OutboundTcpConnection @ 0x6cd263100 
MessagingService-Outgoing-p00skimnosql10.00.egov.local/172.20.4.148-Large keeps 
local variables with total size 306 114 312 (13,97%) bytes.

The memory is accumulated in one instance of 
"org.apache.cassandra.net.OutboundTcpConnection" loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x6c000".
{code}

* Problem Suspect 2*

{code:java}
529 instances of "io.netty.util.concurrent.FastThreadLocalThread", loaded by 
"sun.misc.Launcher$AppClassLoader @ 0x6c000" occupy 776 362 840 (35,43%) 
bytes. 

Biggest instances:

•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d719e1e0 
epollEventLoopGroup-2-7 - 156 689 680 (7,15%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d719e7e0 
epollEventLoopGroup-2-3 - 125 567 112 (5,73%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d719da60 
epollEventLoopGroup-2-12 - 119 599 160 (5,46%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6ceab17b0 
epollEventLoopGroup-2-1 - 118 469 632 (5,41%) bytes. 
•io.netty.util.concurrent.FastThreadLocalThread @ 0x6d7059b00 ReadStage-151 - 
66 494 040 (3,03%) bytes. 
{code}

*Problem Suspect 3*

{code:java}
126 instances of "byte[]", loaded by "" occupy 268 549 640 
(12,26%) bytes. These instances are referenced from one instance of 
"java.util.HashMap$Node[]", loaded by ""

Keywords
byte[]
java.util.HashMap$Node[]
{code}



> Cassandra JVM stop itself randomly
> --
>
> Key: CASSANDRA-13931
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13931
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: RHEL 7.3
> JDK HotSpot 1.8.0_121-b13
> cassandra-3.11 cluster with 43 nodes in 9 datacenters
> 8vCPU, 32 GB RAM
>Reporter: Andrey Lataev
> Attachments: cassandra-env.sh, cassandra.yaml, 
> system.log.2017-10-01.zip
>
>
> Before I set  -XX:MaxDirectMemorySize  I receive  OOM on OS level like;
> # # grep "Out of" /var/log/messages-20170918
> Sep 16 06:54:07 p00skimnosql04 kernel: Out of memory: Kill process 26619 
> (java) score 287 or sacrifice child
> Sep 16 06:54:07 p00skimnosql04 kernel: Out of memory: Kill process 26640 
> (java) score 289 or sacrifice child
> If set  -XX:MaxDirectMemorySize=5G limitation then periodicaly begin receive:
> HeapUtils.java:136 - Dumping heap to 
> /egov/dumps/cassandra-1506868110-pid11155.hprof
> It seems like  JVM kill itself when off-heap memory leaks occur.
> Typical errors in  system.log before JVM begin dumping:
> ERROR