GCs never take that long unless you're swapping.

On Sun, Aug 22, 2010 at 2:11 PM, Moleza Moleza <mole...@gmail.com> wrote:
> Hi,
> I am setting up a cluster on a linux box.
> Everything seems to be working great and I am watching the ring with:
> watch -d -n 2 nodetool -h localhost ring
> Suddenly, I see that one of the nodes just went down (at 14:07):
> Status changed from Up to Down.
> 13 minutes later (without any intervention) the node comes back Up (by 
> itself).
> I check the logs (see at end of text) on that node and see that there
> is nothing in the log from 14:07 until 14:20 (13 minutes later).
> I also notice the GC ConcurrentMarkSweep took 13 minutes.
> Here are my questions:
> [1] Is this behavior normal?
> [2] Has it been observed by someone else before?
> [3] The node being down means that nodetool, and any other client,
> wont be able to connect to it (clients should use other nodes in
> cluster to write data). Correct?
> [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the
> JVM cannot do anything else? Hence then node is technically Down?
> Correct?
> [5] Why is this GC taking such a long time? (see JMV ARGS posted bellow).
> [6] Any JMV Args (switches) I can use to prevent this?
> ----------------------
> JVM_OPTS=" \
>       -Dprog=Cassandra \
>       -ea \
>       -Xms12G \
>       -Xmx12G \
>       -XX:+UseParNewGC \
>       -XX:+UseConcMarkSweepGC \
>       -XX:+CMSParallelRemarkEnabled \
>       -XX:SurvivorRatio=8 \
>       -XX:MaxTenuringThreshold=1 \
>       -XX:+HeapDumpOnOutOfMemoryError \
>       -Dcom.sun.management.jmxremote.port=8080 \
>       -Dcom.sun.management.jmxremote.ssl=false \
>       -Dcom.sun.management.jmxremote.authenticate=false"
>
> --------------------
> #### Log Extract ######
> INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line
> 116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296
> used; max is 13005881344
> INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
> (line 357)HintsColumnFamily has reached its threshold; switching in a
> fresh Memtable at
> CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log',
> position=55517352)
> INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
> (line 609) Enqueuing flush of
> memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations)
> INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line
> 148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433
> operations)
> INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line
> 116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208
> used; max is 13005881344
> INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line
> 116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed
> leaving 196692400 used; max is 13005881344
> --------------
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Reply via email to