The live:serialized size ratio depends on what your data looks like (small columns will be less efficient than large blobs) but using the rule of thumb of 10x, around 1G * (1 + memtable_flush_writers + memtable_flush_queue_size).
So first thing I would do is drop writers and queue to 1 and 1. Then I would drop the max heap to 1G, memtable size to 8MB so the heap dump is easier to analyze. Then let it OOM and look at the dump with http://www.eclipse.org/mat/ On Sat, May 7, 2011 at 3:54 PM, Serediuk, Adam <adam.sered...@serialssolutions.com> wrote: > How much memory should a single hot cf with a 128mb memtable take with row > and key caching disabled during read? > > Because I'm seeing heap go from 3.5gb skyrocketing straight to max > (regardless of the size, 8gb and 24gb both do the same) at which time the jvm > will do nothing but full gc and is unable to reclaim any meaningful amount of > memory. Cassandra then becomes unusable. > > I see the same behavior with smaller memtables, eg 64mb. > > This happens well into the read operation an only on a small number of nodes > in the cluster(1-4 out of a total of 60 nodes.) > > Sent from my iPhone > > On May 6, 2011, at 22:45, "Jonathan Ellis" <jbel...@gmail.com> wrote: > >> You don't GC storm without legitimately having a too-full heap. It's >> normal to see occasional full GCs from fragmentation, but that will >> actually compact the heap and everything goes back to normal IF you >> had space actually freed up. >> >> You say you've played w/ memtable size but that would still be my bet. >> Most people severely underestimate how much space this takes (10x in >> memory over serialized size), which will bite you when you have lots >> of CFs defined. >> >> Otherwise, force a heap dump after a full GC and take a look to see >> what's referencing all the memory. >> >> On Fri, May 6, 2011 at 12:25 PM, Serediuk, Adam >> <adam.sered...@serialssolutions.com> wrote: >>> We're troubleshooting a memory usage problem during batch reads. We've >>> spent the last few days profiling and trying different GC settings. The >>> symptoms are that after a certain amount of time during reads one or more >>> nodes in the cluster will exhibit extreme memory pressure followed by a gc >>> storm. We've tried every possible JVM setting and different GC methods and >>> the issue persists. This is pointing towards something instantiating a lot >>> of objects and keeping references so that they can't be cleaned up. >>> >>> Typically nothing is ever logged other than the GC failures however just >>> now one of the nodes emitted logs we've never seen before: >>> >>> INFO [ScheduledTasks:1] 2011-05-06 15:04:55,085 StorageService.java (line >>> 2218) Unable to reduce heap usage since there are no dirty column families >>> >>> We have tried increasing the heap on these nodes to large values, eg 24GB >>> and still run into the same issue. We're running 8GB of heap normally and >>> only one or two nodes will ever exhibit this issue, randomly. We don't use >>> key/row caching and our memtable sizing is 64mb/0.3. Larger or smaller >>> memtables make no difference in avoiding the issue. We're on 0.7.5, mmap, >>> jna and jdk 1.6.0_24 >>> >>> We've somewhat hit the wall in troubleshooting and any advice is greatly >>> appreciated. >>> >>> -- >>> Adam >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com