If you are using the default settings I would try to correlate the GC activity with some application activity before tweaking.
If this is happening on one machine out of 4 ensure that client load is distributed evenly. See if the raise in GC activity us related to Compaction, repair or an increase in throughput. OpsCentre or some other monitoring can help with the last one. Your mention of TTL makes me think compaction may be doing a bit of work churning through rows. Some things I've done in the past before looking at heap settings: * reduce compaction_throughput to reduce the memory churn * reduce in_memory_compaction_limit * if needed reduce concurrent_compactors > Currently it seems like the memory used scales with the amount of bytes > stored and not with how busy the server actually is. That's not such a good > thing. The memtable_total_space_in_mb in yaml tells C* how much memory to devote to the memtables. That with the global row cache setting says how much memory will be used with regard to "storing" data and it will not increase inline with the static data load. Now days GC issues are typically due to more dynamic forces, like compaction, repair and throughput. Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 20/10/2012, at 6:59 AM, Bryan Talbot <btal...@aeriagames.com> wrote: > ok, let me try asking the question a different way ... > > How does cassandra use memory and how can I plan how much is needed? I have > a 1 GB memtable and 5 GB total heap and that's still not enough even though > the number of concurrent connections and garbage generation rate is fairly > low. > > If I were using mysql or oracle, I could compute how much memory could be > used by N concurrent connections, how much is allocated for caching, temp > spaces, etc. How can I do this for cassandra? Currently it seems like the > memory used scales with the amount of bytes stored and not with how busy the > server actually is. That's not such a good thing. > > -Bryan > > > > On Thu, Oct 18, 2012 at 11:06 AM, Bryan Talbot <btal...@aeriagames.com> wrote: > In a 4 node cluster running Cassandra 1.1.5 with sun jvm 1.6.0_29-b11 > (64-bit), the nodes are often getting "stuck" in state where CMS collections > of the old space are constantly running. > > The JVM configuration is using the standard settings in cassandra-env -- > relevant settings are included below. The max heap is currently set to 5 GB > with 800MB for new size. I don't believe that the cluster is overly busy and > seems to be performing well enough other than this issue. When nodes get > into this state they never seem to leave it (by freeing up old space memory) > without restarting cassandra. They typically enter this state while running > "nodetool repair -pr" but once they start doing this, restarting them only > "fixes" it for a couple of hours. > > Compactions are completing and are generally not queued up. All CF are using > STCS. The busiest CF consumes about 100GB of space on disk, is write heavy, > and all columns have a TTL of 3 days. Overall, there are 41 CF including > those used for system keyspace and secondary indexes. The number of SSTables > per node currently varies from 185-212. > > Other than frequent log warnings about "GCInspector - Heap is 0.xxx full..." > and "StorageService - Flushing CFS(...) to relieve memory pressure" there > are no other log entries to indicate there is a problem. > > Does the memory needed vary depending on the amount of data stored? If so, > how can I predict how much jvm space is needed? I don't want to make the > heap too large as that's bad too. Maybe there's a memory leak related to > compaction that doesn't allow meta-data to be purged? > > > -Bryan > > > 12 GB of RAM in host with ~6 GB used by java and ~6 GB for OS and buffer > cache. > $> free -m > total used free shared buffers cached > Mem: 12001 11870 131 0 4 5778 > -/+ buffers/cache: 6087 5914 > Swap: 0 0 0 > > > jvm settings in cassandra-env > MAX_HEAP_SIZE="5G" > HEAP_NEWSIZE="800M" > > # GC tuning options > JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC" > JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC" > JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled" > JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8" > JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1" > JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75" > JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly" > JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops" > > > jstat shows about 12 full collections per minute with old heap usage > constantly over 75% so CMS is always over the CMSInitiatingOccupancyFraction > threshold. > > $> jstat -gcutil -t 22917 5000 4 > Timestamp S0 S1 E O P YGC YGCT FGC > FGCT GCT > 132063.0 34.70 0.00 26.03 82.29 59.88 21580 506.887 17523 > 3078.941 3585.829 > 132068.0 34.70 0.00 50.02 81.23 59.88 21580 506.887 17524 > 3079.220 3586.107 > 132073.1 0.00 24.92 46.87 81.41 59.88 21581 506.932 17525 > 3079.583 3586.515 > 132078.1 0.00 24.92 64.71 81.40 59.88 21581 506.932 17527 > 3079.853 3586.785 > > > Other hosts not currently experiencing the high CPU load have a heap less > than .75 full. > > $> jstat -gcutil -t 6063 5000 4 > Timestamp S0 S1 E O P YGC YGCT FGC > FGCT GCT > 520731.6 0.00 12.70 36.37 71.33 59.26 46453 1688.809 14785 > 2130.779 3819.588 > 520736.5 0.00 12.70 53.25 71.33 59.26 46453 1688.809 14785 > 2130.779 3819.588 > 520741.5 0.00 12.70 68.92 71.33 59.26 46453 1688.809 14785 > 2130.779 3819.588 > 520746.5 0.00 12.70 83.11 71.33 59.26 46453 1688.809 14785 > 2130.779 3819.588 > > > > >