subject:"Re\: Cassandra Nodes Freeze\/Down for ConcurrentMarkSweep GC\?"

Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Peter Schuller

> [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the
> JVM cannot do anything else? Hence then node is technically Down?
> Correct?

No; the concurrent mark/sweep phase runs concurrently with your
application. CMS will cause a stop-the-world full pause it it fails to
complete a CMS sweep in time and you hit the maximum heap size, but
unless that happens, CMS will run concurrently (though there are
stop-the-world pauses involved, that are typically very short, the
mark/sweep phase is concurrent).

As jbellis pointed out, you're almost certainly swapping.


-- 
/ Peter Schuller

Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Benjamin Black

http://riptano.blip.tv/file/4012133/

On Sun, Aug 22, 2010 at 12:11 PM, Moleza Moleza  wrote:
> Hi,
> I am setting up a cluster on a linux box.
> Everything seems to be working great and I am watching the ring with:
> watch -d -n 2 nodetool -h localhost ring
> Suddenly, I see that one of the nodes just went down (at 14:07):
> Status changed from Up to Down.
> 13 minutes later (without any intervention) the node comes back Up (by 
> itself).
> I check the logs (see at end of text) on that node and see that there
> is nothing in the log from 14:07 until 14:20 (13 minutes later).
> I also notice the GC ConcurrentMarkSweep took 13 minutes.
> Here are my questions:
> [1] Is this behavior normal?
> [2] Has it been observed by someone else before?
> [3] The node being down means that nodetool, and any other client,
> wont be able to connect to it (clients should use other nodes in
> cluster to write data). Correct?
> [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the
> JVM cannot do anything else? Hence then node is technically Down?
> Correct?
> [5] Why is this GC taking such a long time? (see JMV ARGS posted bellow).
> [6] Any JMV Args (switches) I can use to prevent this?
> --
> JVM_OPTS=" \
>       -Dprog=Cassandra \
>       -ea \
>       -Xms12G \
>       -Xmx12G \
>       -XX:+UseParNewGC \
>       -XX:+UseConcMarkSweepGC \
>       -XX:+CMSParallelRemarkEnabled \
>       -XX:SurvivorRatio=8 \
>       -XX:MaxTenuringThreshold=1 \
>       -XX:+HeapDumpOnOutOfMemoryError \
>       -Dcom.sun.management.jmxremote.port=8080 \
>       -Dcom.sun.management.jmxremote.ssl=false \
>       -Dcom.sun.management.jmxremote.authenticate=false"
>
> 
>  Log Extract ##
> INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line
> 116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296
> used; max is 13005881344
> INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
> (line 357)HintsColumnFamily has reached its threshold; switching in a
> fresh Memtable at
> CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log',
> position=55517352)
> INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
> (line 609) Enqueuing flush of
> memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations)
> INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line
> 148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433
> operations)
> INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line
> 116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208
> used; max is 13005881344
> INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line
> 116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed
> leaving 196692400 used; max is 13005881344
> --
>

Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Jonathan Ellis

GCs never take that long unless you're swapping.

On Sun, Aug 22, 2010 at 2:11 PM, Moleza Moleza  wrote:
> Hi,
> I am setting up a cluster on a linux box.
> Everything seems to be working great and I am watching the ring with:
> watch -d -n 2 nodetool -h localhost ring
> Suddenly, I see that one of the nodes just went down (at 14:07):
> Status changed from Up to Down.
> 13 minutes later (without any intervention) the node comes back Up (by 
> itself).
> I check the logs (see at end of text) on that node and see that there
> is nothing in the log from 14:07 until 14:20 (13 minutes later).
> I also notice the GC ConcurrentMarkSweep took 13 minutes.
> Here are my questions:
> [1] Is this behavior normal?
> [2] Has it been observed by someone else before?
> [3] The node being down means that nodetool, and any other client,
> wont be able to connect to it (clients should use other nodes in
> cluster to write data). Correct?
> [4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the
> JVM cannot do anything else? Hence then node is technically Down?
> Correct?
> [5] Why is this GC taking such a long time? (see JMV ARGS posted bellow).
> [6] Any JMV Args (switches) I can use to prevent this?
> --
> JVM_OPTS=" \
>       -Dprog=Cassandra \
>       -ea \
>       -Xms12G \
>       -Xmx12G \
>       -XX:+UseParNewGC \
>       -XX:+UseConcMarkSweepGC \
>       -XX:+CMSParallelRemarkEnabled \
>       -XX:SurvivorRatio=8 \
>       -XX:MaxTenuringThreshold=1 \
>       -XX:+HeapDumpOnOutOfMemoryError \
>       -Dcom.sun.management.jmxremote.port=8080 \
>       -Dcom.sun.management.jmxremote.ssl=false \
>       -Dcom.sun.management.jmxremote.authenticate=false"
>
> 
>  Log Extract ##
> INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line
> 116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296
> used; max is 13005881344
> INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
> (line 357)HintsColumnFamily has reached its threshold; switching in a
> fresh Memtable at
> CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log',
> position=55517352)
> INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
> (line 609) Enqueuing flush of
> memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations)
> INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line
> 148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433
> operations)
> INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line
> 116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208
> used; max is 13005881344
> INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line
> 116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed
> leaving 196692400 used; max is 13005881344
> --
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

2010-08-22 Thread Moleza Moleza

Hi,
I am setting up a cluster on a linux box.
Everything seems to be working great and I am watching the ring with:
watch -d -n 2 nodetool -h localhost ring
Suddenly, I see that one of the nodes just went down (at 14:07):
Status changed from Up to Down.
13 minutes later (without any intervention) the node comes back Up (by itself).
I check the logs (see at end of text) on that node and see that there
is nothing in the log from 14:07 until 14:20 (13 minutes later).
I also notice the GC ConcurrentMarkSweep took 13 minutes.
Here are my questions:
[1] Is this behavior normal?
[2] Has it been observed by someone else before?
[3] The node being down means that nodetool, and any other client,
wont be able to connect to it (clients should use other nodes in
cluster to write data). Correct?
[4] Is GC ConcurrentMarkSweep a Stop-The-World situation? Where the
JVM cannot do anything else? Hence then node is technically Down?
Correct?
[5] Why is this GC taking such a long time? (see JMV ARGS posted bellow).
[6] Any JMV Args (switches) I can use to prevent this?
--
JVM_OPTS=" \
   -Dprog=Cassandra \
   -ea \
   -Xms12G \
   -Xmx12G \
   -XX:+UseParNewGC \
   -XX:+UseConcMarkSweepGC \
   -XX:+CMSParallelRemarkEnabled \
   -XX:SurvivorRatio=8 \
   -XX:MaxTenuringThreshold=1 \
   -XX:+HeapDumpOnOutOfMemoryError \
   -Dcom.sun.management.jmxremote.port=8080 \
   -Dcom.sun.management.jmxremote.ssl=false \
   -Dcom.sun.management.jmxremote.authenticate=false"


 Log Extract ##
INFO [GC inspection] 2010-08-22 14:06:48,622 GCInspector.java (line
116) GC for ParNew: 235 ms, 134504976 reclaimed leaving 12721498296
used; max is 13005881344
INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
(line 357)HintsColumnFamily has reached its threshold; switching in a
fresh Memtable at
CommitLogContext(file='/var/nes/data1/cassandra_commitlog/CommitLog-1282500306160.log',
position=55517352)
INFO [FLUSH-TIMER] 2010-08-22 14:19:45,429 ColumnFamilyStore.java
(line 609) Enqueuing flush of
memtable-hintscolumnfam...@1935604258(3147 bytes, 433 operations)
INFO [FLUSH-WRITER-POOL:1] 2010-08-22 14:19:45,430 Memtable.java (line
148) Writing memtable-hintscolumnfam...@1935604258(3147 bytes, 433
operations)
INFO [GC inspection] 2010-08-22 14:19:45,917 GCInspector.java (line
116) GC for ParNew: 215 ms, 130254256 reclaimed leaving 12742982208
used; max is 13005881344
INFO [GC inspection] 2010-08-22 14:19:45,973 GCInspector.java (line
116)GC for ConcurrentMarkSweep: 775679 ms, 12685881488 reclaimed
leaving 196692400 used; max is 13005881344
--

Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

Re: Cassandra Nodes Freeze/Down for ConcurrentMarkSweep GC?

4 matches

Site Navigation

Mail list logo

Footer information