Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled

J. Ryan Earl Thu, 14 Nov 2013 10:07:09 -0800

First off, I'm curious what hardware (system specs) you're running this on?

Secondly, here are some observations:
* You're not running the newest JDK7, I can tell by your stack-size.
 Consider getting the newest.

* Cassandra 2.0.2 has a lot of improvements, consider upgrading.  We
noticed improved heap usage compared to 2.0.2

* Have you simply tried decreasing the size of your row cache?  Tried 256MB?

* Do you have JNA installed?  Otherwise, you're not getting off-heap usage
for these caches which seems likely.  Check your cassandra.log to verify
JNA operation.

* Your NewGen is too small.  See your heap peaks?  This is because
short-lived memory is being put into OldGen, which only gets cleaned up
during fullGC.  You should set your NewGen to about 25-30% of your total
heapsize.  Many objects are short-lived, and CMS GC is significantly more
efficient if the shorter-lived objects never get promoted to OldGen; you'll
get more concurrent, non-blocking GC.  If you're not using JNA (per above)
row-cache and key-cache is still on-heap, so you want your NewGen to be >=
twice as large as the size of these combined caches.  You should never so
those crazy heap spikes, your caches are essentially overflowing into
OldGen (with JNA).

On Tue, Nov 5, 2013 at 3:04 AM, Jiri Horky <ho...@avast.com> wrote:

> Hi there,
>
> we are seeing extensive memory allocation leading to quite long and
> frequent GC pauses when using row cache. This is on cassandra 2.0.0
> cluster with JNA 4.0 library with following settings:
>
> key_cache_size_in_mb: 300
> key_cache_save_period: 14400
> row_cache_size_in_mb: 1024
> row_cache_save_period: 14400
> commitlog_sync: periodic
> commitlog_sync_period_in_ms: 10000
> commitlog_segment_size_in_mb: 32
>
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G
> -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError
>
> -XX:HeapDumpPath=/data2/cassandra-work/instance-1/cassandra-1383566283-pid1893.hprof
> -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark
>
> We have disabled row cache on one node to see  the  difference. Please
> see attached plots from visual VM, I think that the effect is quite
> visible. I have also taken 10x "jmap -histo" after 5s on a affected
> server and plotted the result, attached as well.
>
> I have taken a dump of the application when the heap size was 10GB, most
> of the memory was unreachable, which was expected. The majority was used
> by 55-59M objects of HeapByteBuffer, byte[] and
> org.apache.cassandra.db.Column classes. I also include a list of inbound
> references to the HeapByteBuffer objects from which it should be visible
> where they are being allocated. This was acquired using Eclipse MAT.
>
> Here is the comparison of GC times when row cache enabled and disabled:
>
> prg01 - row cache enabled
>       - uptime 20h45m
>       - ConcurrentMarkSweep - 11494686ms
>       - ParNew - 14690885 ms
>       - time spent in GC: 35%
> prg02 - row cache disabled
>       - uptime 23h45m
>       - ConcurrentMarkSweep - 251ms
>       - ParNew - 230791 ms
>       - time spent in GC: 0.27%
>
> I would be grateful for any hints. Please let me know if you need any
> further information. For now, we are going to disable the row cache.
>
> Regards
> Jiri Horky
>

Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled

Reply via email to