[
https://issues.apache.org/jira/browse/CASSANDRA-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tomas Salfischberger updated CASSANDRA-6191:
Description:
Not sure bug is the right description, because I can't say for sure that the
large number of SSTables is the cause of the memory issues. I'll share my
research so far:
Under high read-load with a very large number of compressed SSTables (caused by
the initial default 5mb sstable_size in LCS) it seems memory is exhausted,
without any room for GC to fix this. It tries to GC but doesn't reclaim much.
The node first hits the emergency valves flushing all memtables, then
reducing caches. And finally logs 0.99+ heap usages and hangs with GC failure
or crashes with OutOfMemoryError.
I've taken a heapdump and started analysis to find out what's wrong. The memory
seems to be used by the byte[] backing the HeapByteBuffer in the compressed
field of org.apache.cassandra.io.compress.CompressedRandomAccessReader. The
byte[] are generally 65536 byes in size, matching the block-size of the
compression.
Looking further in the heap-dump I can see that these readers are part of the
pool in org.apache.cassandra.io.util.CompressedPoolingSegmentedFile. Which is
linked to the dfile field of org.apache.cassandra.io.sstable.SSTableReader.
The dump-file lists 45248 instances of CompressedRandomAccessReader.
Is this intended to go this way? Is there a leak somewhere? Or should there be
an alternative strategy and/or warning for cases where a node is trying to read
far too many SSTables?
EDIT:
Searching through the code I found that PoolingSegmentedFile keeps a pool of
RandomAccessReader for re-use. While the CompressedRandomAccessReader allocates
a ByteBuffer in it's constructor and (to make things worse) enlarges it if it's
reasing a large chunk. This (sometimes enlarged) ByteBuffer is then kept alive
because it becomes part of the CompressedRandomAccessReader which is in turn
kept alive as part of the pool in the PoolingSegmentedFile.
was:
Not sure bug is the right description, because I can't say for sure that the
large number of SSTables is the cause of the memory issues. I'll share my
research so far:
Under high read-load with a very large number of compressed SSTables (caused by
the initial default 5mb sstable_size in LCS) it seems memory is exhausted,
without any room for GC to fix this. It tries to GC but doesn't reclaim much.
The node first hits the emergency valves flushing all memtables, then
reducing caches. And finally logs 0.99+ heap usages and hangs with GC failure
or crashes with OutOfMemoryError.
I've taken a heapdump and started analysis to find out what's wrong. The memory
seems to be used by the byte[] backing the HeapByteBuffer in the compressed
field of org.apache.cassandra.io.compress.CompressedRandomAccessReader. The
byte[] are generally 65536 byes in size, matching the block-size of the
compression.
Looking further in the heap-dump I can see that these readers are part of the
pool in org.apache.cassandra.io.util.CompressedPoolingSegmentedFile. Which is
linked to the dfile field of org.apache.cassandra.io.sstable.SSTableReader.
The dump-file lists 45248 instances of CompressedRandomAccessReader.
Is this intended to go this way? Is there a leak somewhere? Or should there be
an alternative strategy and/or warning for cases where a node is trying to read
far too many SSTables?
Memory exhaustion with large number of compressed SSTables
--
Key: CASSANDRA-6191
URL: https://issues.apache.org/jira/browse/CASSANDRA-6191
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: OS: Debian 7.1
Java: Oracle 1.7.0_25
Cassandra: 1.2.10
Memory: 24GB
Heap: 8GB
Reporter: Tomas Salfischberger
Not sure bug is the right description, because I can't say for sure that
the large number of SSTables is the cause of the memory issues. I'll share my
research so far:
Under high read-load with a very large number of compressed SSTables (caused
by the initial default 5mb sstable_size in LCS) it seems memory is exhausted,
without any room for GC to fix this. It tries to GC but doesn't reclaim much.
The node first hits the emergency valves flushing all memtables, then
reducing caches. And finally logs 0.99+ heap usages and hangs with GC failure
or crashes with OutOfMemoryError.
I've taken a heapdump and started analysis to find out what's wrong. The
memory seems to be used by the byte[] backing the HeapByteBuffer in the
compressed field of
org.apache.cassandra.io.compress.CompressedRandomAccessReader. The byte[] are
generally 65536 byes in size, matching the block-size of the compression.
Looking further in the heap-dump I can see that these readers are part of the
pool