I am currently doing some test on cassandra (0.3.0-final). two nodes with each node has 8G ram and 8 core cpus. And here are some setting from my storage-conf.xml:
<ReplicationFactor>2</ReplicationFactor> <ColumnIndexSizeInKB>256</ColumnIndexSizeInKB> <MemtableSizeInMB>1024</MemtableSizeInMB> <MemtableObjectCountInMillions>2</MemtableObjectCountInMillions> The test data I have has about 880K unique keys and my test program simply inserts the same 5 columns into Table1.Standard1 using thrift.batch_insert. For each key, the record size ranges from 21 bytes to 5K with the average being 40 bytes. The program calls batch_insert repeatedly - 4 million times with 50 concurrent thrift connections (about 220MB data excluding keys is sent to cassandra). What I see was basically the JAVA resident memory grows to the GC limit (6G) and everything just halts after that. If I restarted cassandra I can see the footprint is around 1.9G and I can do insert again but the memory keeps growing and so on. Here is my JVM setting: -Xmx6000m -Xms6000m -XX:+HeapDumpOnOutOfMemoryError -XX:NewSize=1000m -XX:MaxNewSize=1000m -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -Xloggc:gc.log Here is the jmap output (top 10 objects): num #instances #bytes class name -------------------------------------- 1: 1436005 658220304 [Ljava.lang.Object; 2: 12100491 484019640 java.lang.String 3: 9904511 437577600 [C 4: 2709812 322398784 [I 5: 5607988 224319520 java.util.concurrent.ConcurrentSkipListMap$Node 6: 4469810 214550880 org.apache.cassandra.db.Column 7: 3339219 213710016 org.cliffc.high_scale_lib.ConcurrentAutoTable$CAT 8: 3339230 191142200 [J 9: 4506140 179147648 [B 10: 2220024 106561152 java.util.concurrent.ConcurrentSkipListMap$HeadIndex Does anyone have any idea why cassandra uses so much memory? From gc.log I do see gc has kicked in many times (but not major compaction though). I'd expect that with this small data set everything would just work fine with the avail. memory (I mean the test should just go on for weeks). Any suggestion? Thanks, Huming