Just a thought along those lines.  If the memtable flush isn’t keeping up, you 
might find that manifested in the I/O queue length and dirty page stats leading 
into the time the OOM event took place.  If you do see that, then you might 
need to do some I/O tuning as well.

From: Jeff Jirsa <jji...@gmail.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Friday, January 24, 2020 at 12:09 PM
To: cassandra <user@cassandra.apache.org>
Subject: Re: Cassandra going OOM due to tombstones (heapdump screenshots 
provided)

Message from External Sender
Ah, I focused too much on the literal meaning of startup. If it's happening 
JUST AFTER startup, it's probably getting flooded with hints from the other 
hosts when it comes online.

If that's the case, it may be just simply overrunning the memtable, or it may 
be a deadlock like 
https://issues.apache.org/jira/browse/CASSANDRA-15367<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D15367&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=xvWMhVA_EB7th2JOLzsv1bwJexP86yoGYYF1fVKCyiY&s=8R5fcf3jPRcLDd84v_DsBrnwcD6R2kjxYrHnhY_uZCI&e=>
 (which benedict just updated this morning, good timing)


If it's after the host comes online and it's hint replay from the other hosts, 
you probably want to throttle hint replay significantly on the rest of the 
cluster. Whatever your hinted handoff throttle is, consider dropping it by 
50-90% to work around whichever of those two problems it is.


On Fri, Jan 24, 2020 at 9:06 AM Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote:
6 GB of mutations on heap
Startup would replay commitlog, which would re-materialize all of those 
mutations and put them into the memtable. The memtable would flush over time to 
disk, and clear the commitlog.

It looks like PERHAPS the commitlog replay is faster than the memtable flush, 
so you're blowing out the memtable while you're replaying the commitlog.

How much memory does the machine have? How much of that is allocated to the 
heap? What are your memtable settings? Do you see log lines about flushing 
memtables to free room (probably something like the slab pool cleaner)?



On Fri, Jan 24, 2020 at 3:16 AM Behroz Sikander 
<bsikan...@apache.org<mailto:bsikan...@apache.org>> wrote:
We recently had a lot of OOM in C* and it was generally happening during 
startup.
We took some heap dumps but still cannot pin point the exact reason. So, we 
need some help from experts.

Our clients are not explicitly deleting data but they have TTL enabled.

C* details:
> show version
[cqlsh 5.0.1 | Cassandra 2.2.9 | CQL spec 3.3.1 | Native protocol v4]

Most of the heap was allocated was the object[]
- org.apache.cassandra.db.Cell

Heap dump images:
Heap usage by class: 
https://pasteboard.co/IRrfu70.png<https://urldefense.proofpoint.com/v2/url?u=https-3A__pasteboard.co_IRrfu70.png&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=xvWMhVA_EB7th2JOLzsv1bwJexP86yoGYYF1fVKCyiY&s=0M1RXKgr2FaOkDaCrrw-o4VSu-zGzRYG3sfgWRAYlF4&e=>
Classes using most heap: 
https://pasteboard.co/IRrgszZ.png<https://urldefense.proofpoint.com/v2/url?u=https-3A__pasteboard.co_IRrgszZ.png&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=xvWMhVA_EB7th2JOLzsv1bwJexP86yoGYYF1fVKCyiY&s=T406pWfXNeQTxLAatvR4gj9yF-_Rux-RehJZvJdMmbA&e=>
Overall heap usage: 
https://pasteboard.co/IRrg7t1.png<https://urldefense.proofpoint.com/v2/url?u=https-3A__pasteboard.co_IRrg7t1.png&d=DwMFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=xvWMhVA_EB7th2JOLzsv1bwJexP86yoGYYF1fVKCyiY&s=qJPwjEHNRFJFbq_TuwgQGxjs_zQWAPX9TAS-3PDpLnk&e=>

What could be the reason for such OOM? Something that we can tune to improve 
this?
Any help would be much appreciated.


---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@cassandra.apache.org<mailto:user-unsubscr...@cassandra.apache.org>
For additional commands, e-mail: 
user-h...@cassandra.apache.org<mailto:user-h...@cassandra.apache.org>

Reply via email to