Re: About the heap

Wei Zhu Wed, 13 Mar 2013 11:35:58 -0700

Hi Dean,
The index_interval is controlling the sampling of the SSTable to speed up the 
lookup of the keys in the SSTable. Here is the code:


https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DataTracker.java#L478

To increase the interval meaning, taking less samples, less memory, slower 
lookup for read.

I did do a heap dump on my production system which caused about 10 seconds 
pause of the node. I found something interesting, for LCS, it could involve 
thousands of SSTables for one compaction, the ancestors are recorded in case 
something goes wrong during the compaction. But those are never removed after 
the compaction is done. In our case, it takes about 1G of heap memory to store 
that. I am going to submit a JIRA for that. 

Here is the culprit:

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/io/sstable/SSTableMetadata.java#L58

Enjoy looking at Cassandra code:)

-Wei
 

----- Original Message -----
From: "Dean Hiller" <dean.hil...@nrel.gov>
To: user@cassandra.apache.org
Sent: Wednesday, March 13, 2013 11:11:14 AM
Subject: Re: About the heap

Going to 1.2.2 helped us quite a bit as well as turning on LCS from STCS which 
gave us smaller bloomfilters.

As far as key cache.  There is an entry in cassandra.yaml called index_interval 
set to 128.  I am not sure if that is related to key_cache.  I think it is.  By 
turning that to 512 or maybe even 1024, you will consume less ram there as well 
though I ran this test in QA and my key cache size stayed the same so I am 
really not sure(I am actually checking out cassandra code now to dig a little 
deeper into this property.

Dean

From: Alain RODRIGUEZ <arodr...@gmail.com<mailto:arodr...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, March 13, 2013 10:11 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: About the heap

Hi,

I would like to know everything that is in the heap.

We are here speaking of C*1.1.6

Theory :

- Memtable (1024 MB)
- Key Cache (100 MB)
- Row Cache (disabled, and serialized with JNA activated anyway, so should be 
off-heap)
- BloomFilters (about 1,03 GB - from cfstats, adding all the "Bloom Filter 
Space Used" and considering they are showed in Bytes - 1103765112)
- Anything else ?

So my heap should be fluctuating between 1,15 GB and 2.15 GB and growing slowly 
(from the new BF of my new data).

My heap is actually changing from 3-4 GB to 6 GB and sometimes growing to the 
max 8 GB (crashing the node).

Because of this I have an unstable cluster and have no other choice than use 
Amazon EC2 xLarge instances when we would rather use twice more EC2 Large nodes.

What am I missing ?

Practice :

Is there a way not inducing any load and easy to do to dump the heap to analyse 
it with MAT (or anything else that you could advice) ?

Alain

Re: About the heap

Reply via email to