[jira] [Comment Edited] (CASSANDRA-13663) Cassandra 3.10 crashes without dump

2017-11-03 Thread Ricardo Bartolome (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238184#comment-16238184
 ] 

Ricardo Bartolome edited comment on CASSANDRA-13663 at 11/3/17 7:24 PM:


UPDATE: in our case (not the case or the author of the ticket) the JVM was 
crashing. We realised this by enabling Oracle JVM ErrorFile and kernel core 
dumps.
{code}
-XX:ErrorFile=/var/lib/cassandra/heapdump/cassandra-jvm-file-error-1509734684-pid31745.log
{code}

Cassandra 3.9. It happens with OracleJDK 1.8.0_112 and 1.8.0_131. With kernel 
4.9.43-17.38.amzn1.x86_64 and 3.14.35-28.38.amzn1.x86_64

We'll try to share the crash log, but since we are not familiar with its 
contents, we are checking it does not contain any sensible information



was (Author: ricbartm):
UPDATE: in our case (not the case or the author of the ticket) the JVM was 
crashing. We realised this by enabling Oracle JVM ErrorFile and kernel core 
dumps.
{code}
-XX:ErrorFile=/var/lib/cassandra/heapdump/cassandra-jvm-file-error-1509734684-pid31745.log
{code}

It happens with OracleJDK 1.8.0_112 and 1.8.0_131. With kernel 
4.9.43-17.38.amzn1.x86_64 and 3.14.35-28.38.amzn1.x86_64

We'll try to share the crash log, but since we are not familiar with its 
contents, we are checking it does not contain any sensible information


> Cassandra 3.10 crashes without dump
> ---
>
> Key: CASSANDRA-13663
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13663
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Matthias Otto
>Priority: Minor
> Attachments: 2017-07-04 10_48_34-CloudWatch Management Console.png, 
> RamUsageExamle1.png, RamUsageExample2.png, cassandra debug.log, cassandra 
> system.log
>
>
> Hello. My company runs a 5 node Cassandra cluster. For the last few weeks, we 
> have had a sporadic issue where one of the servers crashes without creating a 
> dump file and without any error messages in the logs. If one restarts the 
> service (which we have by now scripted to happen automatically), the servers 
> resumes work with no complaint.
> Log files of the time of the last crash are attached, thou again they do not 
> log any crash happening.
> Regarding out setup, we are running these servers on AMazon AWS, with 3 
> volumes per server, one for the system, one for data and one for the 
> commitlog. When a crash happens, we can observe a sudden spike of read 
> activity on the commitlog volume. All of these have ample free space. 
> Aspecially the system volume has more then enough free space so that a dump 
> could be written.
> The servers are Ubuntu 16.04 servers and Cassandra is installed from the 
> apt-get packet for version 3.10.
> It is worth noting that these crashes happen more often when nodetool is 
> running either repair job or a backup job, but this is by no means always the 
> case. As for frequency, we have had about 1-2 crashes per week for the last 
> month.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13663) Cassandra 3.10 crashes without dump

2017-09-19 Thread Ricardo Bartolome (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16171766#comment-16171766
 ] 

Ricardo Bartolome edited comment on CASSANDRA-13663 at 9/19/17 2:08 PM:


Does anybody have news about this issue?

We are experiencing a similar issue even in our case we don't see any 
oom-killer errors. Our scenario is:
* 12 x i3.2xlarge instances (8 CPU, 64GB memory)
* Storage per node is ~400GB
* Cassandra 3.9 (looking for an upgrade to 3.10, but nothing appears listed in 
the CHANGELOG related to this issue, and now we found this issue)
* Oracle JVM build 1.8.0_112-b15

We also have 1-2 dead nodes a week. We have been enabling HeapDumps on several 
nodes to help to identify but so far didn't reproduce in the nodes that have it 
enabled (neither if it will contain some useful information, if the problem is 
off-heap).

Some off-heap memory usage statistics gathered through JMX exploring the 
following beans:
* org.apache.cassandra.metrics:name=BloomFilterOffHeapMemoryUsed,type=Table
* org.apache.cassandra.metrics:name=AllMemtablesOffHeapSize,type=Table
* 
org.apache.cassandra.metrics:name=CompressionMetadataOffHeapMemoryUsed,type=Table

h4. BloomFilterOffHeapMemoryUsed (~ 1.6GB)
{code}
Value = 1619193928;
Value = 1546767024;
Value = 1669879216;
Value = 1576772336;
Value = 1567804792;
Value = 1605097824;
Value = 1608551904;
Value = 1502500424;
Value = 1363705192;
Value = 1259389280;
Value = 1671282736;
{code}

h4. AllMemtablesOffHeapSize (~700MB)
{code}
Value = 692597111;
Value = 617691154;
Value = 693412363;
Value = 708732630;
Value = 664297343;
Value = 705367430;
Value = 626936323;
Value = 652724309;
Value = 700223457;
Value = 666516571;
Value = 682268720;
{code}

h4. CompressionMetadataOffHeapMemoryUsed (~110MB)
{code}
Value = 111307336;
Value = 105718312;
Value = 110638576;
Value = 111370032;
Value = 105979296;
Value = 108963456;
Value = 22216;
Value = 99788200;
Value = 95279232;
Value = 106130392;
Value = 113217400;
{code}

Any idea what else I can look at?



was (Author: ricbartm):
Does anybody have news about this issue?

We are experiencing a similar issue even in our case we don't see any 
oom-killer errors. Our scenario is:
* 12 x i3.2xlarge instances (8 CPU, 64GB memory)
* Storage per node is ~400GB
* Cassandra 3.9 (looking for an upgrade, but nothing appears listed in the 
CHANGELOG related to this issue)
* Oracle JVM build 1.8.0_112-b15

We also have 1-2 dead nodes a week. We have been enabling HeapDumps on several 
nodes to help to identify but so far didn't reproduce in the nodes that have it 
enabled (neither if it will contain some useful information, if the problem is 
off-heap).

Some off-heap memory usage statistics gathered through JMX exploring the 
following beans:
* org.apache.cassandra.metrics:name=BloomFilterOffHeapMemoryUsed,type=Table
* org.apache.cassandra.metrics:name=AllMemtablesOffHeapSize,type=Table
* 
org.apache.cassandra.metrics:name=CompressionMetadataOffHeapMemoryUsed,type=Table

h4. BloomFilterOffHeapMemoryUsed (~ 1.6GB)
{code}
Value = 1619193928;
Value = 1546767024;
Value = 1669879216;
Value = 1576772336;
Value = 1567804792;
Value = 1605097824;
Value = 1608551904;
Value = 1502500424;
Value = 1363705192;
Value = 1259389280;
Value = 1671282736;
{code}

h4. AllMemtablesOffHeapSize (~700MB)
{code}
Value = 692597111;
Value = 617691154;
Value = 693412363;
Value = 708732630;
Value = 664297343;
Value = 705367430;
Value = 626936323;
Value = 652724309;
Value = 700223457;
Value = 666516571;
Value = 682268720;
{code}

h4. CompressionMetadataOffHeapMemoryUsed (~110MB)
{code}
Value = 111307336;
Value = 105718312;
Value = 110638576;
Value = 111370032;
Value = 105979296;
Value = 108963456;
Value = 22216;
Value = 99788200;
Value = 95279232;
Value = 106130392;
Value = 113217400;
{code}

Any idea what else I can look at?


> Cassandra 3.10 crashes without dump
> ---
>
> Key: CASSANDRA-13663
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13663
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Matthias Otto
>Priority: Minor
> Attachments: 2017-07-04 10_48_34-CloudWatch Management Console.png, 
> cassandra debug.log, cassandra system.log, RamUsageExamle1.png, 
> RamUsageExample2.png
>
>
> Hello. My company runs a 5 node Cassandra cluster. For the last few weeks, we 
> have had a sporadic issue where one of the servers crashes without creating a 
> dump file and without any error messages in the logs. If one restarts the 
> service (which we have by now scripted to happen automatically), the servers 
> resumes work with no complaint.
> Log files of the time of the last crash are attached, thou again they do not 
> log any crash happening.
> Regarding out setup, we are running these servers on AMazon AWS, wit

[jira] [Comment Edited] (CASSANDRA-13663) Cassandra 3.10 crashes without dump

2017-08-05 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115628#comment-16115628
 ] 

Jeff Jirsa edited comment on CASSANDRA-13663 at 8/6/17 4:44 AM:


It's likely being killed by the system oom-killer - you'll probably see traces 
in smear

Cassandra allocates a fixed amount of memory for the heap, and a variable 
amount of RAM offheap for things like bloom filters and compression offsets

Both bloom filters and compression offsets scale linearly with disk space (and 
can be much higher with various table specific settings), and are allocated in 
chunks during compaction.

It's very likely you're seeing a compaction start, and the bloom filter 
allocation triggers the oom-killer

How much ram is in the system? How big is the heap? How much data is on each 
node?


was (Author: jjirsa):
It's likely being killed by the system on-killer - you'll probably see traces 
in smear

Cassandra allocates a fixed amount of memory for the heap, and a variable 
amount of RAM offheap for things like bloom filters and compression offsets

Both bloom filters and compression offsets scale linearly with disk space (and 
can be much higher with various table specific settings), and are allocated in 
chunks during compaction.

It's very likely you're seeing a compaction start, and the bloom filter 
allocation triggers the Pom-killer

How much ram is in the system? How big is the heap? How much data is on each 
node?

> Cassandra 3.10 crashes without dump
> ---
>
> Key: CASSANDRA-13663
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13663
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Matthias Otto
>Priority: Minor
> Attachments: 2017-07-04 10_48_34-CloudWatch Management Console.png, 
> cassandra debug.log, cassandra system.log, RamUsageExamle1.png, 
> RamUsageExample2.png
>
>
> Hello. My company runs a 5 node Cassandra cluster. For the last few weeks, we 
> have had a sporadic issue where one of the servers crashes without creating a 
> dump file and without any error messages in the logs. If one restarts the 
> service (which we have by now scripted to happen automatically), the servers 
> resumes work with no complaint.
> Log files of the time of the last crash are attached, thou again they do not 
> log any crash happening.
> Regarding out setup, we are running these servers on AMazon AWS, with 3 
> volumes per server, one for the system, one for data and one for the 
> commitlog. When a crash happens, we can observe a sudden spike of read 
> activity on the commitlog volume. All of these have ample free space. 
> Aspecially the system volume has more then enough free space so that a dump 
> could be written.
> The servers are Ubuntu 16.04 servers and Cassandra is installed from the 
> apt-get packet for version 3.10.
> It is worth noting that these crashes happen more often when nodetool is 
> running either repair job or a backup job, but this is by no means always the 
> case. As for frequency, we have had about 1-2 crashes per week for the last 
> month.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org