Re: Reduce Cassandra GC

2013-06-15 Thread Takenori Sato
 INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
1046937600

This says GC for New Generation took so long. And this is usually unlikely.

The only situation I am aware of is when a fairly large object is created,
and which can not be promoted to Old Generation because it requires such a
large *contiguous* memory space that is unavailable at the point in time.
This is called promotion failure. So it has to wait until concurrent
collector collects a large enough space. Thus you experience stop the
world. But I think it is not stop the world, but only stop the new world.

For example in case of Cassandra, a large number of
in_memory_compaction_limit_in_mb can cause this. This is a limit when a
compaction compacts(merges) rows of a key into the latest in memory. So
this creates a large byte array up to the number.

You can confirm this by enabling promotion failure GC logging in the
future, and by checking compactions executed at that point in time.



On Sat, Jun 15, 2013 at 10:01 AM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, Jun 7, 2013 at 12:42 PM, Igor i...@4friends.od.ua wrote:
  If you are talking about 1.2.x then I also have memory problems on the
 idle
  cluster: java memory constantly slow grows up to limit, then spend long
 time
  for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle
  cluster java memory stay on the same value.

 If you are not aware of a pre-existing JIRA, I strongly encourage you to :

 1) Document your experience of this.
 2) Search issues.apache.org for anything that sounds similar.
 3) If you are unable to find a JIRA, file one.

 Thanks!

 =Rob



Re: Reduce Cassandra GC

2013-06-15 Thread Mohit Anchlia
Can you paste you gc config? Also can you take a heap dump at 2 diff points so 
that we can compare it?

Quick thing to do would be to do a histo live at 2 points and compare

Sent from my iPhone

On Jun 15, 2013, at 6:57 AM, Takenori Sato ts...@cloudian.com wrote:

  INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) 
  GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 
  1046937600
 
 This says GC for New Generation took so long. And this is usually unlikely. 
 
 The only situation I am aware of is when a fairly large object is created, 
 and which can not be promoted to Old Generation because it requires such a 
 large *contiguous* memory space that is unavailable at the point in time. 
 This is called promotion failure. So it has to wait until concurrent 
 collector collects a large enough space. Thus you experience stop the world. 
 But I think it is not stop the world, but only stop the new world.
 
 For example in case of Cassandra, a large number of 
 in_memory_compaction_limit_in_mb can cause this. This is a limit when a 
 compaction compacts(merges) rows of a key into the latest in memory. So this 
 creates a large byte array up to the number.
 
 You can confirm this by enabling promotion failure GC logging in the future, 
 and by checking compactions executed at that point in time.
 
 
 
 On Sat, Jun 15, 2013 at 10:01 AM, Robert Coli rc...@eventbrite.com wrote:
 On Fri, Jun 7, 2013 at 12:42 PM, Igor i...@4friends.od.ua wrote:
  If you are talking about 1.2.x then I also have memory problems on the idle
  cluster: java memory constantly slow grows up to limit, then spend long 
  time
  for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle
  cluster java memory stay on the same value.
 
 If you are not aware of a pre-existing JIRA, I strongly encourage you to :
 
 1) Document your experience of this.
 2) Search issues.apache.org for anything that sounds similar.
 3) If you are unable to find a JIRA, file one.
 
 Thanks!
 
 =Rob
 


Re: Reduce Cassandra GC

2013-06-15 Thread Takenori Sato
Uncomment the followings in cassandra-env.sh.

JVM_OPTS=$JVM_OPTS -XX:+PrintGCDateStamps

JVM_OPTS=$JVM_OPTS -XX:+PrintPromotionFailure
JVM_OPTS=$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log

* *Also can you take a heap dump at 2 diff points so that we can compare it?

No, I'm afraid. I ordinary use profiling tools, but am not aware of
anything that could respond during this event.



On Sun, Jun 16, 2013 at 4:44 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Can you paste you gc config? Also can you take a heap dump at 2 diff
 points so that we can compare it?

 Quick thing to do would be to do a histo live at 2 points and compare

 Sent from my iPhone

 On Jun 15, 2013, at 6:57 AM, Takenori Sato ts...@cloudian.com wrote:

  INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
 1046937600

 This says GC for New Generation took so long. And this is usually
 unlikely.

 The only situation I am aware of is when a fairly large object is created,
 and which can not be promoted to Old Generation because it requires such a
 large *contiguous* memory space that is unavailable at the point in time.
 This is called promotion failure. So it has to wait until concurrent
 collector collects a large enough space. Thus you experience stop the
 world. But I think it is not stop the world, but only stop the new world.

 For example in case of Cassandra, a large number of
 in_memory_compaction_limit_in_mb can cause this. This is a limit when a
 compaction compacts(merges) rows of a key into the latest in memory. So
 this creates a large byte array up to the number.

 You can confirm this by enabling promotion failure GC logging in the
 future, and by checking compactions executed at that point in time.



 On Sat, Jun 15, 2013 at 10:01 AM, Robert Coli rc...@eventbrite.comwrote:

 On Fri, Jun 7, 2013 at 12:42 PM, Igor i...@4friends.od.ua wrote:
  If you are talking about 1.2.x then I also have memory problems on the
 idle
  cluster: java memory constantly slow grows up to limit, then spend long
 time
  for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle
  cluster java memory stay on the same value.

 If you are not aware of a pre-existing JIRA, I strongly encourage you to :

 1) Document your experience of this.
 2) Search issues.apache.org for anything that sounds similar.
 3) If you are unable to find a JIRA, file one.

 Thanks!

 =Rob





Re: Reduce Cassandra GC

2013-06-15 Thread Takenori Sato
 Also can you take a heap dump at 2 diff points so that we can compare it?

Also note that a promotion failure won't happen by a particular object, but
by a fragmentation in Old Generation space. So I am not sure if you can't
tell by a heap dump comparison.


On Sun, Jun 16, 2013 at 4:44 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

 Can you paste you gc config? Also can you take a heap dump at 2 diff
 points so that we can compare it?

 Quick thing to do would be to do a histo live at 2 points and compare

 Sent from my iPhone

 On Jun 15, 2013, at 6:57 AM, Takenori Sato ts...@cloudian.com wrote:

  INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
 1046937600

 This says GC for New Generation took so long. And this is usually
 unlikely.

 The only situation I am aware of is when a fairly large object is created,
 and which can not be promoted to Old Generation because it requires such a
 large *contiguous* memory space that is unavailable at the point in time.
 This is called promotion failure. So it has to wait until concurrent
 collector collects a large enough space. Thus you experience stop the
 world. But I think it is not stop the world, but only stop the new world.

 For example in case of Cassandra, a large number of
 in_memory_compaction_limit_in_mb can cause this. This is a limit when a
 compaction compacts(merges) rows of a key into the latest in memory. So
 this creates a large byte array up to the number.

 You can confirm this by enabling promotion failure GC logging in the
 future, and by checking compactions executed at that point in time.



 On Sat, Jun 15, 2013 at 10:01 AM, Robert Coli rc...@eventbrite.comwrote:

 On Fri, Jun 7, 2013 at 12:42 PM, Igor i...@4friends.od.ua wrote:
  If you are talking about 1.2.x then I also have memory problems on the
 idle
  cluster: java memory constantly slow grows up to limit, then spend long
 time
  for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle
  cluster java memory stay on the same value.

 If you are not aware of a pre-existing JIRA, I strongly encourage you to :

 1) Document your experience of this.
 2) Search issues.apache.org for anything that sounds similar.
 3) If you are unable to find a JIRA, file one.

 Thanks!

 =Rob