Hi,
I am setting up a cluster on a linux box.
Everything seems to be working great and I am watching the ring with:
watch -d -n 2 nodetool -h localhost ring
Suddenly, I see that one of the nodes just went down (at 14:07):
Status changed from Up to Down.
13 minutes later (without any intervention) the node comes back Up (by itself).
I check the logs (see at end of text) on that node and see that there
is nothing in the log from 14:07 until 14:20 (13 minutes later).
I also notice the GC ConcurrentMarkSweep took 13 minutes.
Here are my questions:
[1] Is this behavior normal?
[2] Has it been observed by someone else before?
[3] The node being down means that nodetool, and any other client,
wont be able to connect to it (clients should use other nodes in
cluster to write data). Correct?
[4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the
JVM cannot do anything else? Hence then node is technically Down?
Correct?
[5] Why is this GC taking such a long time? (see JMV ARGS posted bellow).
[6] Any JMV Args (switches) I can use to prevent this?
----------------------
JVM_OPTS=" \
       -Dprog=Cassandra \
       -ea \
       -Xms12G \
       -Xmx12G \
       -XX:+UseParNewGC \
       -XX:+UseConcMarkSweepGC \
       -XX:+CMSParallelRemarkEnabled \
       -XX:SurvivorRatio=8 \
       -XX:MaxTenuringThreshold=1 \
       -XX:+HeapDumpOnOutOfMemoryError \
       -Dcom.sun.management.jmxremote.port=8080 \
       -Dcom.sun.management.jmxremote.ssl=false \
       -Dcom.sun.management.jmxremote.authenticate=false"

--------------------
#### Log Extract ######
INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line
116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296
used; max is 13005881344
INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
(line 357)HintsColumnFamily has reached its threshold; switching in a
fresh Memtable at
CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log',
position=55517352)
INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
(line 609) Enqueuing flush of
memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations)
INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line
148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433
operations)
INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line
116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208
used; max is 13005881344
INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line
116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed
leaving 196692400 used; max is 13005881344
--------------

Reply via email to