GCs never take that long unless you're swapping. On Sun, Aug 22, 2010 at 2:11 PM, Moleza Moleza <mole...@gmail.com> wrote: > Hi, > I am setting up a cluster on a linux box. > Everything seems to be working great and I am watching the ring with: > watch -d -n 2 nodetool -h localhost ring > Suddenly, I see that one of the nodes just went down (at 14:07): > Status changed from Up to Down. > 13 minutes later (without any intervention) the node comes back Up (by > itself). > I check the logs (see at end of text) on that node and see that there > is nothing in the log from 14:07 until 14:20 (13 minutes later). > I also notice the GC ConcurrentMarkSweep took 13 minutes. > Here are my questions: > [1] Is this behavior normal? > [2] Has it been observed by someone else before? > [3] The node being down means that nodetool, and any other client, > wont be able to connect to it (clients should use other nodes in > cluster to write data). Correct? > [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the > JVM cannot do anything else? Hence then node is technically Down? > Correct? > [5] Why is this GC taking such a long time? (see JMV ARGS posted bellow). > [6] Any JMV Args (switches) I can use to prevent this? > ---------------------- > JVM_OPTS=" \ > -Dprog=Cassandra \ > -ea \ > -Xms12G \ > -Xmx12G \ > -XX:+UseParNewGC \ > -XX:+UseConcMarkSweepGC \ > -XX:+CMSParallelRemarkEnabled \ > -XX:SurvivorRatio=8 \ > -XX:MaxTenuringThreshold=1 \ > -XX:+HeapDumpOnOutOfMemoryError \ > -Dcom.sun.management.jmxremote.port=8080 \ > -Dcom.sun.management.jmxremote.ssl=false \ > -Dcom.sun.management.jmxremote.authenticate=false" > > -------------------- > #### Log Extract ###### > INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line > 116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296 > used; max is 13005881344 > INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java > (line 357)HintsColumnFamily has reached its threshold; switching in a > fresh Memtable at > CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log', > position=55517352) > INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java > (line 609) Enqueuing flush of > memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations) > INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line > 148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433 > operations) > INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line > 116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208 > used; max is 13005881344 > INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line > 116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed > leaving 196692400 used; max is 13005881344 > -------------- >
-- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com