Re: cassandra full gc too often

2016-01-04 Thread Robert Coli
On Sun, Jan 3, 2016 at 5:54 PM, Shuo Chen  wrote:

> There are client operation in these days. Besides most columnfamily in the
> cluster are supercolumnfamily created by cassandra-cli. Most rows have
> average 30 sub-rows and each sub-row has 20 columns.
>

Supercolumns, especially pre-CQL implementation, have serious performance
and memory concerns.

In general, one should not use them.

=Rob


Re: cassandra full gc too often

2016-01-03 Thread Shuo Chen
The following is the snippets of nodetool compactions history. It shows
many records relates to keyspace system.

Compaction History:
id   keyspace_name
 columnfamily_namecompacted_at  bytes_in
bytes_out  rows_merged
8e4f8830-b04f-11e5-a211-45b7aa88107c system
sstable_activity 1451629144115 3342   915
 {4:23}
96a6fcb0-b04b-11e5-a211-45b7aa88107c system hints
 1451627440123 18970740   18970740   {1:1}
7c42c940-adac-11e5-8bd4-45b7aa88107c system hints
 1451339203540 56969835   56782732   {2:3}
585b97a0-ad98-11e5-8bd4-45b7aa88107c system
sstable_activity 1451330553370 3700   956
 {4:24}
aefc3f10-b1b2-11e5-a211-45b7aa88107c system
sstable_activity 1451781670273 3201   906
 {4:23}
1e76f1b0-b180-11e5-a211-45b7aa88107c system
sstable_activity 1451759952971 3303   700
 {4:23}
e7b75b70-aec2-11e5-8bd4-45b7aa88107c system hints
 1451458783911 57690316   57497847   {2:3}
ad102280-af6d-11e5-b1dc-45b7aa88107c
webtrn_study_log_formallySCORM_STU_COURSE 1451532129448
45671877   41137664   {1:11, 3:1, 4:8}
60906970-aec7-11e5-8bd4-45b7aa88107c system
sstable_activity 1451460704647 3751   974
 {4:25}
88aed310-ad91-11e5-8bd4-45b7aa88107c system hints
 1451327627969 56984347   56765328   {2:3}
3ad14f00-af6d-11e5-b1dc-45b7aa88107c
webtrn_study_log_formallySCORM_STU_COURSE 1451531937776
46696097   38827028   {1:8, 3:2, 4:9}
84df8fb0-b00f-11e5-a211-45b7aa88107c system hints
 1451601640491 18970740   18970740   {1:1}
657482e0-ad33-11e5-8bd4-45b7aa88107c system
sstable_activity 1451287196174 3701   931
 {4:24}
9cc8af70-b24a-11e5-a211-45b7aa88107c system
sstable_activity 1451846923239 3134   773
 {4:23}
dcbe5e30-afd0-11e5-a211-45b7aa88107c system
sstable_activity 1451574729619 3357   790
 {4:23}
b285ced0-afa0-11e5-84e3-45b7aa88107c system hints
 1451554042941 43310718   42137761   {1:1,
2:2}
119770e0-ad4e-11e5-8bd4-45b7aa88107c system hints
 1451298651886 57397441   57190519   {2:3}
f1bb37a0-b204-11e5-a211-45b7aa88107c system hints
 1451817000986 17713746   17566349   {1:1}
dcd0c720-b0b4-11e5-a211-45b7aa88107c system
sstable_activity 1451672654993 3189   797
 {4:23}
cadb2320-ae94-11e5-8bd4-45b7aa88107c system
sstable_activity 1451438978642 3642   959
 {4:25}
321a84d0-af63-11e5-8bd4-45b7aa88107c system hints
 1451527628189 57407115   57179555   {1:3}
5d760ca0-af20-11e5-8bd4-45b7aa88107c system hints
 1451498924650 57405307   57234936   {1:3}
2e59a0b0-af6d-11e5-b1dc-45b7aa88107c system
compactions_in_progress  1451531916859 464190
 {1:1, 2:1}
8723a290-adba-11e5-8bd4-45b7aa88107c system hints
 1451345234745 57065795   56814910   {2:3}
add0a8d0-af6c-11e5-b1dc-45b7aa88107c system
compactions_in_progress  1451531701213 46442
  {2:2}
04c0d2f0-af6d-11e5-b1dc-45b7aa88107c system
compactions_in_progress  1451531847071 4460
 {2:2}
315dd170-af98-11e5-b1dc-45b7aa88107c system schema_columns
  1451550390279 129142 45841  {2:1,
3:1, 4:9}
a17d8f80-b24b-11e5-a211-45b7aa88107c system hints
 1451847360632 16746317   16600978   {1:1}
e22e3b90-ad39-11e5-8bd4-45b7aa88107c system hints
 1451289982409 57240038   56984895   {2:3}
38591c10-b22b-11e5-a211-45b7aa88107c system hints
 1451833440337 17351684   17002773   {1:1}
fc274930-af6c-11e5-b1dc-45b7aa88107c system hints
 1451531832643 63540861   59716559   {2:3}
aba057f0-b1b4-11e5-a211-45b7aa88107c system hints
 1451782523631 18743329   18577220   {1:1}
81d5e980-af5c-11e5-8bd4-45b7aa88107c system hints
 1451524755480 57603966   57407115   {2:3}
a1801510-af6d-11e5-b1dc-45b7aa88107c

cassandra full gc too often

2015-12-31 Thread Shuo Chen
I have a cassandra 2.0.6 cluster with four nodes as backup database. The
only operation is posting data into db. Recently, the full gc of the nodes
increases apparently and blocks cluster operation.

The load of each node is 10G. The heap is 8G each with default jvm memory
settings. The cpu is 24 cores. The settings of cassandra.yaml are almost
default.

For debug, all the clients are disconnected, however the gc is still high.

The GC is too often that it appears in every minute and blocks for nealy
30s.

How to debug this situation

The following is the snippets of gc log:

before full gc histogram


Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree  Height: 0
1620.992: [GC 7377342K(8178944K), 0.1380260 secs]
Before GC:
Statistics for BinaryTreeDictionary:

Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree  Height: 0
Before GC:
Statistics for BinaryTreeDictionary:

Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree  Height: 0
--
   1:  61030245 1952967840  java.util.concurrent.FutureTask
   2:  61031398 1464753552
 java.util.concurrent.Executors$RunnableAdapter
   3:  61030312 1464727488
 java.util.concurrent.LinkedBlockingQueue$Node
   4:  61030244 1464725856
 org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask
   5:214181   13386992  [B
   6:2028909738720  java.nio.HeapByteBuffer
   7: 376225994808  [C
   8: 421635722096  
   9: 421635407152  
  10:  41704648240  
  11:1000604002400
 org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
  12:  41702860816  
  13:  36642702720  
  14:  48172701576  [J
  15:  25751056600  
  16: 37431 898344  java.lang.String
  17:  2585 627352  [Ljava.lang.Object;
  18: 17728 567296
 java.util.concurrent.ConcurrentHashMap$HashEntry
  19: 22111 530664  javax.management.ObjectName$Property
  20:  7595 463824  [S
  21:  4534 433560  java.lang.Class
  22: 11581 370592  java.util.HashMap$Entry

...
Total 289942930 8399196504
1658.022: [Full GC 8178943K->6241829K(8178944K), 27.8562520 secs]
CMS: Large block 0x0007f7da9588

After GC:
Statistics for BinaryTreeDictionary:

Total Free Space: 6335823
Max   Chunk Size: 6335823
Number of Blocks: 1
Av.  Block  Size: 6335823
Tree  Height: 1
After GC:
Statistics for BinaryTreeDictionary:

Total Free Space: 0
Max   Chunk Size: 0
Number of Blocks: 0
Tree  Height: 0


It seems related to objects of Futuretask.

-- 
*陈硕* *Shuo Chen*
chenatu2...@gmail.com
chens...@whaty.com


Re: cassandra full gc too often

2015-12-31 Thread Ipremyadav
Simplest option is to use java 8 with G1 gc. 

> On 31 Dec 2015, at 10:23 a.m., Shuo Chen  wrote:
> 
> I have a cassandra 2.0.6 cluster with four nodes as backup database. The only 
> operation is posting data into db. Recently, the full gc of the nodes 
> increases apparently and blocks cluster operation.
> 
> The load of each node is 10G. The heap is 8G each with default jvm memory 
> settings. The cpu is 24 cores. The settings of cassandra.yaml are almost 
> default.
> 
> For debug, all the clients are disconnected, however the gc is still high.
> 
> The GC is too often that it appears in every minute and blocks for nealy 30s.
> 
> How to debug this situation
> 
> The following is the snippets of gc log:
> 
> before full gc histogram
> 
> 
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree  Height: 0
> 1620.992: [GC 7377342K(8178944K), 0.1380260 secs]
> Before GC:
> Statistics for BinaryTreeDictionary:
> 
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree  Height: 0
> Before GC:
> Statistics for BinaryTreeDictionary:
> 
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree  Height: 0
> --
>1:  61030245 1952967840  java.util.concurrent.FutureTask
>2:  61031398 1464753552  
> java.util.concurrent.Executors$RunnableAdapter
>3:  61030312 1464727488  
> java.util.concurrent.LinkedBlockingQueue$Node
>4:  61030244 1464725856  
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask
>5:214181   13386992  [B
>6:2028909738720  java.nio.HeapByteBuffer
>7: 376225994808  [C
>8: 421635722096  
>9: 421635407152  
>   10:  41704648240  
>   11:1000604002400  
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
>   12:  41702860816  
>   13:  36642702720  
>   14:  48172701576  [J
>   15:  25751056600  
>   16: 37431 898344  java.lang.String
>   17:  2585 627352  [Ljava.lang.Object;
>   18: 17728 567296  
> java.util.concurrent.ConcurrentHashMap$HashEntry
>   19: 22111 530664  javax.management.ObjectName$Property
>   20:  7595 463824  [S
>   21:  4534 433560  java.lang.Class
>   22: 11581 370592  java.util.HashMap$Entry
> 
> ...
> Total 289942930 8399196504
> 1658.022: [Full GC 8178943K->6241829K(8178944K), 27.8562520 secs]
> CMS: Large block 0x0007f7da9588
> 
> After GC:
> Statistics for BinaryTreeDictionary:
> 
> Total Free Space: 6335823
> Max   Chunk Size: 6335823
> Number of Blocks: 1
> Av.  Block  Size: 6335823
> Tree  Height: 1
> After GC:
> Statistics for BinaryTreeDictionary:
> 
> Total Free Space: 0
> Max   Chunk Size: 0
> Number of Blocks: 0
> Tree  Height: 0
> 
> 
> It seems related to objects of Futuretask.
> 
> -- 
> 陈硕 Shuo Chen
> chenatu2...@gmail.com
> chens...@whaty.com


Re: cassandra full gc too often

2015-12-31 Thread Graham Sanderson
If you are lucky that might mask the real issue, but I doubt it… that is an 
insane number of compaction tasks and indicative of another problem. I would 
check release notes of 2.0.6+, if I recall that was not a stable version and 
may have had leaks.

Aside from that, just FYI, if you use native_objects for memtables you can run 
CMS fine even up to largish (20-30G) heap sizes without long GC pauses (at 
least over months) . That said look for promotion failures of 131074 DWORDS. 
Not sure what your GC logging options are, but what you posted doesn’t say why 
a full GC happened which would be totally tunable, but as I say I don’t think 
GC is actually your problem

> On Dec 31, 2015, at 10:16 AM, Ipremyadav  wrote:
> 
> Simplest option is to use java 8 with G1 gc. 
> 
> On 31 Dec 2015, at 10:23 a.m., Shuo Chen  > wrote:
> 
>> I have a cassandra 2.0.6 cluster with four nodes as backup database. The 
>> only operation is posting data into db. Recently, the full gc of the nodes 
>> increases apparently and blocks cluster operation.
>> 
>> The load of each node is 10G. The heap is 8G each with default jvm memory 
>> settings. The cpu is 24 cores. The settings of cassandra.yaml are almost 
>> default.
>> 
>> For debug, all the clients are disconnected, however the gc is still high.
>> 
>> The GC is too often that it appears in every minute and blocks for nealy 30s.
>> 
>> How to debug this situation
>> 
>> The following is the snippets of gc log:
>> 
>> before full gc histogram
>> 
>> 
>> Total Free Space: 0
>> Max   Chunk Size: 0
>> Number of Blocks: 0
>> Tree  Height: 0
>> 1620.992: [GC 7377342K(8178944K), 0.1380260 secs]
>> Before GC:
>> Statistics for BinaryTreeDictionary:
>> 
>> Total Free Space: 0
>> Max   Chunk Size: 0
>> Number of Blocks: 0
>> Tree  Height: 0
>> Before GC:
>> Statistics for BinaryTreeDictionary:
>> 
>> Total Free Space: 0
>> Max   Chunk Size: 0
>> Number of Blocks: 0
>> Tree  Height: 0
>> --
>>1:  61030245 1952967840  java.util.concurrent.FutureTask
>>2:  61031398 1464753552  
>> java.util.concurrent.Executors$RunnableAdapter
>>3:  61030312 1464727488  
>> java.util.concurrent.LinkedBlockingQueue$Node
>>4:  61030244 1464725856  
>> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask
>>5:214181   13386992  [B
>>6:2028909738720  java.nio.HeapByteBuffer
>>7: 376225994808  [C
>>8: 421635722096  
>>9: 421635407152  
>>   10:  41704648240  
>>   11:1000604002400  
>> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo
>>   12:  41702860816  
>>   13:  36642702720  
>>   14:  48172701576  [J
>>   15:  25751056600  
>>   16: 37431 898344  java.lang.String
>>   17:  2585 627352  [Ljava.lang.Object;
>>   18: 17728 567296  
>> java.util.concurrent.ConcurrentHashMap$HashEntry
>>   19: 22111 530664  javax.management.ObjectName$Property
>>   20:  7595 463824  [S
>>   21:  4534 433560  java.lang.Class
>>   22: 11581 370592  java.util.HashMap$Entry
>> 
>> ...
>> Total 289942930 8399196504
>> 1658.022: [Full GC 8178943K->6241829K(8178944K), 27.8562520 secs]
>> CMS: Large block 0x0007f7da9588
>> 
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> 
>> Total Free Space: 6335823
>> Max   Chunk Size: 6335823
>> Number of Blocks: 1
>> Av.  Block  Size: 6335823
>> Tree  Height: 1
>> After GC:
>> Statistics for BinaryTreeDictionary:
>> 
>> Total Free Space: 0
>> Max   Chunk Size: 0
>> Number of Blocks: 0
>> Tree  Height: 0
>> 
>> 
>> It seems related to objects of Futuretask.
>> 
>> -- 
>> 陈硕 Shuo Chen
>> chenatu2...@gmail.com 
>> chens...@whaty.com 


smime.p7s
Description: S/MIME cryptographic signature


Re: full gc too often

2014-12-07 Thread Philo Yang
2014-12-05 15:40 GMT+08:00 Jonathan Haddad j...@jonhaddad.com:

 I recommend reading through
 https://issues.apache.org/jira/browse/CASSANDRA-8150 to get an idea of
 how the JVM GC works and what you can do to tune it.  Also good is Blake
 Eggleston's writeup which can be found here:
 http://blakeeggleston.com/cassandra-tuning-the-jvm-for-read-heavy-workloads.html

 I'd like to note that allocating 4GB heap to Cassandra under any serious
 workload is unlikely to be sufficient.


Thanks for your recommendation. After reading I try to allocate a larger
heap and it is useful for me. 4G heap can't handle the workload in my use
case indeed.

So another question is, how much pressure dose default max heap (8G) can
handle? The pressure may not be a simple qps, you know, slice query for
many columns in a row will allocate more objects in heap than the query for
a single column. Is there any testing result for the relationship between
the pressure and the safety heap size? We know query a slice with many
tombstones is not a good use case, but query a slice without tombstones may
be a common use case, right?




 On Thu Dec 04 2014 at 8:43:38 PM Philo Yang ud1...@gmail.com wrote:

 I have two kinds of machine:
 16G RAM, with default heap size setting, about 4G.
 64G RAM, with default heap size setting, about 8G.

 These two kinds of nodes have same number of vnodes, and both of them
 have gc issue, although the node of 16G have a higher probability  of gc
 issue.

 Thanks,
 Philo Yang


 2014-12-05 12:34 GMT+08:00 Tim Heckman t...@pagerduty.com:

 On Dec 4, 2014 8:14 PM, Philo Yang ud1...@gmail.com wrote:
 
  Hi,all
 
  I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with
 full gc that sometime there may be one or two nodes full gc more than one
 time per minute and over 10 seconds each time, then the node will be
 unreachable and the latency of cluster will be increased.
 
  I grep the GCInspector's log, I found when the node is running fine
 without gc trouble there are two kinds of gc:
  ParNew GC in less than 300ms which clear the Par Eden Space and
 enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in
 more than 200ms, there is only a small number of ParNew GC in log)
  ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
 enlarge Par Eden Space little, each 1-2 hours it will be executed once.
 
  However, sometimes ConcurrentMarkSweep will be strange like it shows:
 
  INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 -
 3579838464; Par Eden Space: 503316480 - 294794576; Par Survivor Space:
 62914528 - 0
  INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 -
 3579836512; Par Eden Space: 503316480 - 310562032; Par Survivor Space:
 62872496 - 0
  INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 -
 3579805792; Par Eden Space: 503316480 - 332391096; Par Survivor Space:
 62914544 - 0
  INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 -
 3579829760; Par Eden Space: 503316480 - 351991456; Par Survivor Space:
 62914552 - 0
  INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 -
 3579799752; Par Eden Space: 503316480 - 366222584; Par Survivor Space:
 62914560 - 0
  INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 -
 3579817392; Par Eden Space: 503316480 - 388702928; Par Survivor Space:
 62914552 - 0
  INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 -
 3579838424; Par Eden Space: 503316480 - 408992784; Par Survivor Space:
 62896720 - 0
  INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 - 3579816424;
 Par Eden Space: 503316480 - 438633608; Par Survivor Space: 62914544 - 0
  INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 -
 3579785496; Par Eden Space: 503316480 - 441354856; Par Survivor Space:
 62889528 - 0
  INFO  [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12082ms.  CMS Old Gen: 3579786592 -
 3579814464; Par Eden Space: 503316480 - 448782440; Par Survivor Space:
 62914560 - 0
 
  In each time Old Gen reduce only a little, Survivor Space will be
 clear but the heap is still full so there will be another full gc very soon
 then the node will down. If I restart the node, it will be fine without gc
 trouble.
 
  Can anyone help me to find out where is the problem that full gc can't
 reduce CMS Old Gen? Is 

Re: full gc too often

2014-12-07 Thread Jonathan Haddad
There's a lot of factors that go into tuning, and I don't know of any
reliable formula that you can use to figure out what's going to work
optimally for your hardware.  Personally I recommend:

1) find the bottleneck
2) playing with a parameter (or two)
3) see what changed, performance wise

If you've got a specific question I think someone can find a way to help,
but asking what can 8gb of heap give me is pretty abstract and
unanswerable.

Jon

On Sun Dec 07 2014 at 8:03:53 AM Philo Yang ud1...@gmail.com wrote:

 2014-12-05 15:40 GMT+08:00 Jonathan Haddad j...@jonhaddad.com:

 I recommend reading through https://issues.apache.
 org/jira/browse/CASSANDRA-8150 to get an idea of how the JVM GC works
 and what you can do to tune it.  Also good is Blake Eggleston's writeup
 which can be found here: http://blakeeggleston.com/
 cassandra-tuning-the-jvm-for-read-heavy-workloads.html

 I'd like to note that allocating 4GB heap to Cassandra under any serious
 workload is unlikely to be sufficient.


 Thanks for your recommendation. After reading I try to allocate a larger
 heap and it is useful for me. 4G heap can't handle the workload in my use
 case indeed.

 So another question is, how much pressure dose default max heap (8G) can
 handle? The pressure may not be a simple qps, you know, slice query for
 many columns in a row will allocate more objects in heap than the query for
 a single column. Is there any testing result for the relationship between
 the pressure and the safety heap size? We know query a slice with many
 tombstones is not a good use case, but query a slice without tombstones may
 be a common use case, right?




 On Thu Dec 04 2014 at 8:43:38 PM Philo Yang ud1...@gmail.com wrote:

 I have two kinds of machine:
 16G RAM, with default heap size setting, about 4G.
 64G RAM, with default heap size setting, about 8G.

 These two kinds of nodes have same number of vnodes, and both of them
 have gc issue, although the node of 16G have a higher probability  of gc
 issue.

 Thanks,
 Philo Yang


 2014-12-05 12:34 GMT+08:00 Tim Heckman t...@pagerduty.com:

 On Dec 4, 2014 8:14 PM, Philo Yang ud1...@gmail.com wrote:
 
  Hi,all
 
  I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with
 full gc that sometime there may be one or two nodes full gc more than one
 time per minute and over 10 seconds each time, then the node will be
 unreachable and the latency of cluster will be increased.
 
  I grep the GCInspector's log, I found when the node is running fine
 without gc trouble there are two kinds of gc:
  ParNew GC in less than 300ms which clear the Par Eden Space and
 enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in
 more than 200ms, there is only a small number of ParNew GC in log)
  ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
 enlarge Par Eden Space little, each 1-2 hours it will be executed once.
 
  However, sometimes ConcurrentMarkSweep will be strange like it shows:
 
  INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 -
 3579838464; Par Eden Space: 503316480 - 294794576; Par Survivor
 Space: 62914528 - 0
  INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 -
 3579836512; Par Eden Space: 503316480 - 310562032; Par Survivor
 Space: 62872496 - 0
  INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 -
 3579805792; Par Eden Space: 503316480 - 332391096; Par Survivor
 Space: 62914544 - 0
  INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 -
 3579829760; Par Eden Space: 503316480 - 351991456; Par Survivor
 Space: 62914552 - 0
  INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 -
 3579799752; Par Eden Space: 503316480 - 366222584; Par Survivor
 Space: 62914560 - 0
  INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 -
 3579817392; Par Eden Space: 503316480 - 388702928; Par Survivor
 Space: 62914552 - 0
  INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 -
 3579838424; Par Eden Space: 503316480 - 408992784; Par Survivor
 Space: 62896720 - 0
  INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 -
 3579816424; Par Eden Space: 503316480 - 438633608; Par Survivor
 Space: 62914544 - 0
  INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 -
 3579785496; Par Eden Space: 503316480 - 441354856; Par Survivor
 Space: 62889528 - 0
  INFO  [Service 

Re: full gc too often

2014-12-05 Thread Robert Coli
On Thu, Dec 4, 2014 at 8:13 PM, Philo Yang ud1...@gmail.com wrote:

 In each time Old Gen reduce only a little, Survivor Space will be clear
 but the heap is still full so there will be another full gc very soon then
 the node will down. If I restart the node, it will be fine without gc
 trouble.

 Can anyone help me to find out where is the problem that full gc can't
 reduce CMS Old Gen? Is it because there are too many objects in heap can't
 be recycled? I think review the table scheme designing and add new nodes
 into cluster is a good idea, but I still want to know if there is any other
 reason causing this trouble.


Yes. This state is what I call GC pre-fail because collections happen
constantly, and reclaim not-enough-heap.

In general what you have to do is look for sources of heap pressure that
put you into GC pre-fail and eliminate them or tune the generations to make
them acceptable for your workloads. If extended runtime with steady state
load, then growth in total stored data size and its derived increased heap
consumption is often to blame.

=Rob


full gc too often

2014-12-04 Thread Philo Yang
Hi,all

I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full gc
that sometime there may be one or two nodes full gc more than one time per
minute and over 10 seconds each time, then the node will be unreachable and
the latency of cluster will be increased.

I grep the GCInspector's log, I found when the node is running fine without
gc trouble there are two kinds of gc:
ParNew GC in less than 300ms which clear the Par Eden Space and enlarge CMS
Old Gen/ Par Survivor Space little (because it only show gc in more than
200ms, there is only a small number of ParNew GC in log)
ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
enlarge Par Eden Space little, each 1-2 hours it will be executed once.

However, sometimes ConcurrentMarkSweep will be strange like it shows:

INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 - 3579838464;
Par Eden Space: 503316480 - 294794576; Par Survivor Space: 62914528 - 0
INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 - 3579836512;
Par Eden Space: 503316480 - 310562032; Par Survivor Space: 62872496 - 0
INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 - 3579805792;
Par Eden Space: 503316480 - 332391096; Par Survivor Space: 62914544 - 0
INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 - 3579829760;
Par Eden Space: 503316480 - 351991456; Par Survivor Space: 62914552 - 0
INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 - 3579799752;
Par Eden Space: 503316480 - 366222584; Par Survivor Space: 62914560 - 0
INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 - 3579817392;
Par Eden Space: 503316480 - 388702928; Par Survivor Space: 62914552 - 0
INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 - 3579838424;
Par Eden Space: 503316480 - 408992784; Par Survivor Space: 62896720 - 0
INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 - 3579816424;
Par Eden Space: 503316480 - 438633608; Par Survivor Space: 62914544 - 0
INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 - 3579785496;
Par Eden Space: 503316480 - 441354856; Par Survivor Space: 62889528 - 0
INFO  [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 -
ConcurrentMarkSweep GC in 12082ms.  CMS Old Gen: 3579786592 - 3579814464;
Par Eden Space: 503316480 - 448782440; Par Survivor Space: 62914560 - 0

In each time Old Gen reduce only a little, Survivor Space will be clear but
the heap is still full so there will be another full gc very soon then the
node will down. If I restart the node, it will be fine without gc trouble.

Can anyone help me to find out where is the problem that full gc can't
reduce CMS Old Gen? Is it because there are too many objects in heap can't
be recycled? I think review the table scheme designing and add new nodes
into cluster is a good idea, but I still want to know if there is any other
reason causing this trouble.

Thanks,
Philo Yang


Re: full gc too often

2014-12-04 Thread Tim Heckman
On Dec 4, 2014 8:14 PM, Philo Yang ud1...@gmail.com wrote:

 Hi,all

 I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full
gc that sometime there may be one or two nodes full gc more than one time
per minute and over 10 seconds each time, then the node will be unreachable
and the latency of cluster will be increased.

 I grep the GCInspector's log, I found when the node is running fine
without gc trouble there are two kinds of gc:
 ParNew GC in less than 300ms which clear the Par Eden Space and
enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in
more than 200ms, there is only a small number of ParNew GC in log)
 ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
enlarge Par Eden Space little, each 1-2 hours it will be executed once.

 However, sometimes ConcurrentMarkSweep will be strange like it shows:

 INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 - 3579838464;
Par Eden Space: 503316480 - 294794576; Par Survivor Space: 62914528 - 0
 INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 - 3579836512;
Par Eden Space: 503316480 - 310562032; Par Survivor Space: 62872496 - 0
 INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 - 3579805792;
Par Eden Space: 503316480 - 332391096; Par Survivor Space: 62914544 - 0
 INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 - 3579829760;
Par Eden Space: 503316480 - 351991456; Par Survivor Space: 62914552 - 0
 INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 - 3579799752;
Par Eden Space: 503316480 - 366222584; Par Survivor Space: 62914560 - 0
 INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 - 3579817392;
Par Eden Space: 503316480 - 388702928; Par Survivor Space: 62914552 - 0
 INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 - 3579838424;
Par Eden Space: 503316480 - 408992784; Par Survivor Space: 62896720 - 0
 INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 - 3579816424;
Par Eden Space: 503316480 - 438633608; Par Survivor Space: 62914544 - 0
 INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 - 3579785496;
Par Eden Space: 503316480 - 441354856; Par Survivor Space: 62889528 - 0
 INFO  [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 -
ConcurrentMarkSweep GC in 12082ms.  CMS Old Gen: 3579786592 - 3579814464;
Par Eden Space: 503316480 - 448782440; Par Survivor Space: 62914560 - 0

 In each time Old Gen reduce only a little, Survivor Space will be clear
but the heap is still full so there will be another full gc very soon then
the node will down. If I restart the node, it will be fine without gc
trouble.

 Can anyone help me to find out where is the problem that full gc can't
reduce CMS Old Gen? Is it because there are too many objects in heap can't
be recycled? I think review the table scheme designing and add new nodes
into cluster is a good idea, but I still want to know if there is any other
reason causing this trouble.

How much total system memory do you have? How much is allocated for heap
usage? How big is your working data set?

The reason I ask is that I've seen problems with lots of GC with no room
gained, and it was memory pressure. Not enough for the heap. We decided
that just increasing the heap size was a bad idea, as we did rely on free
RAM being used for filesystem caching. So some vertical and horizontal
scaling allowed us to give Cass more heap space, as well as distribute the
workload to try and avoid further problems.

 Thanks,
 Philo Yang

Cheers!
-Tim


Re: full gc too often

2014-12-04 Thread Philo Yang
I have two kinds of machine:
16G RAM, with default heap size setting, about 4G.
64G RAM, with default heap size setting, about 8G.

These two kinds of nodes have same number of vnodes, and both of them have
gc issue, although the node of 16G have a higher probability  of gc issue.

Thanks,
Philo Yang


2014-12-05 12:34 GMT+08:00 Tim Heckman t...@pagerduty.com:

 On Dec 4, 2014 8:14 PM, Philo Yang ud1...@gmail.com wrote:
 
  Hi,all
 
  I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with full
 gc that sometime there may be one or two nodes full gc more than one time
 per minute and over 10 seconds each time, then the node will be unreachable
 and the latency of cluster will be increased.
 
  I grep the GCInspector's log, I found when the node is running fine
 without gc trouble there are two kinds of gc:
  ParNew GC in less than 300ms which clear the Par Eden Space and
 enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in
 more than 200ms, there is only a small number of ParNew GC in log)
  ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
 enlarge Par Eden Space little, each 1-2 hours it will be executed once.
 
  However, sometimes ConcurrentMarkSweep will be strange like it shows:
 
  INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 - 3579838464;
 Par Eden Space: 503316480 - 294794576; Par Survivor Space: 62914528 - 0
  INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 - 3579836512;
 Par Eden Space: 503316480 - 310562032; Par Survivor Space: 62872496 - 0
  INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 - 3579805792;
 Par Eden Space: 503316480 - 332391096; Par Survivor Space: 62914544 - 0
  INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 - 3579829760;
 Par Eden Space: 503316480 - 351991456; Par Survivor Space: 62914552 - 0
  INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 - 3579799752;
 Par Eden Space: 503316480 - 366222584; Par Survivor Space: 62914560 - 0
  INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 - 3579817392;
 Par Eden Space: 503316480 - 388702928; Par Survivor Space: 62914552 - 0
  INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 - 3579838424;
 Par Eden Space: 503316480 - 408992784; Par Survivor Space: 62896720 - 0
  INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 - 3579816424;
 Par Eden Space: 503316480 - 438633608; Par Survivor Space: 62914544 - 0
  INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 - 3579785496;
 Par Eden Space: 503316480 - 441354856; Par Survivor Space: 62889528 - 0
  INFO  [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12082ms.  CMS Old Gen: 3579786592 - 3579814464;
 Par Eden Space: 503316480 - 448782440; Par Survivor Space: 62914560 - 0
 
  In each time Old Gen reduce only a little, Survivor Space will be clear
 but the heap is still full so there will be another full gc very soon then
 the node will down. If I restart the node, it will be fine without gc
 trouble.
 
  Can anyone help me to find out where is the problem that full gc can't
 reduce CMS Old Gen? Is it because there are too many objects in heap can't
 be recycled? I think review the table scheme designing and add new nodes
 into cluster is a good idea, but I still want to know if there is any other
 reason causing this trouble.

 How much total system memory do you have? How much is allocated for heap
 usage? How big is your working data set?

 The reason I ask is that I've seen problems with lots of GC with no room
 gained, and it was memory pressure. Not enough for the heap. We decided
 that just increasing the heap size was a bad idea, as we did rely on free
 RAM being used for filesystem caching. So some vertical and horizontal
 scaling allowed us to give Cass more heap space, as well as distribute the
 workload to try and avoid further problems.

  Thanks,
  Philo Yang

 Cheers!
 -Tim



Re: full gc too often

2014-12-04 Thread Jonathan Haddad
I recommend reading through
https://issues.apache.org/jira/browse/CASSANDRA-8150 to get an idea of how
the JVM GC works and what you can do to tune it.  Also good is Blake
Eggleston's writeup which can be found here:
http://blakeeggleston.com/cassandra-tuning-the-jvm-for-read-heavy-workloads.html

I'd like to note that allocating 4GB heap to Cassandra under any serious
workload is unlikely to be sufficient.


On Thu Dec 04 2014 at 8:43:38 PM Philo Yang ud1...@gmail.com wrote:

 I have two kinds of machine:
 16G RAM, with default heap size setting, about 4G.
 64G RAM, with default heap size setting, about 8G.

 These two kinds of nodes have same number of vnodes, and both of them have
 gc issue, although the node of 16G have a higher probability  of gc issue.

 Thanks,
 Philo Yang


 2014-12-05 12:34 GMT+08:00 Tim Heckman t...@pagerduty.com:

 On Dec 4, 2014 8:14 PM, Philo Yang ud1...@gmail.com wrote:
 
  Hi,all
 
  I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with
 full gc that sometime there may be one or two nodes full gc more than one
 time per minute and over 10 seconds each time, then the node will be
 unreachable and the latency of cluster will be increased.
 
  I grep the GCInspector's log, I found when the node is running fine
 without gc trouble there are two kinds of gc:
  ParNew GC in less than 300ms which clear the Par Eden Space and
 enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in
 more than 200ms, there is only a small number of ParNew GC in log)
  ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
 enlarge Par Eden Space little, each 1-2 hours it will be executed once.
 
  However, sometimes ConcurrentMarkSweep will be strange like it shows:
 
  INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 - 3579838464;
 Par Eden Space: 503316480 - 294794576; Par Survivor Space: 62914528 - 0
  INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 - 3579836512;
 Par Eden Space: 503316480 - 310562032; Par Survivor Space: 62872496 - 0
  INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 - 3579805792;
 Par Eden Space: 503316480 - 332391096; Par Survivor Space: 62914544 - 0
  INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 - 3579829760;
 Par Eden Space: 503316480 - 351991456; Par Survivor Space: 62914552 - 0
  INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 - 3579799752;
 Par Eden Space: 503316480 - 366222584; Par Survivor Space: 62914560 - 0
  INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 - 3579817392;
 Par Eden Space: 503316480 - 388702928; Par Survivor Space: 62914552 - 0
  INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 - 3579838424;
 Par Eden Space: 503316480 - 408992784; Par Survivor Space: 62896720 - 0
  INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 - 3579816424;
 Par Eden Space: 503316480 - 438633608; Par Survivor Space: 62914544 - 0
  INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 - 3579785496;
 Par Eden Space: 503316480 - 441354856; Par Survivor Space: 62889528 - 0
  INFO  [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 -
 ConcurrentMarkSweep GC in 12082ms.  CMS Old Gen: 3579786592 - 3579814464;
 Par Eden Space: 503316480 - 448782440; Par Survivor Space: 62914560 - 0
 
  In each time Old Gen reduce only a little, Survivor Space will be clear
 but the heap is still full so there will be another full gc very soon then
 the node will down. If I restart the node, it will be fine without gc
 trouble.
 
  Can anyone help me to find out where is the problem that full gc can't
 reduce CMS Old Gen? Is it because there are too many objects in heap can't
 be recycled? I think review the table scheme designing and add new nodes
 into cluster is a good idea, but I still want to know if there is any other
 reason causing this trouble.

 How much total system memory do you have? How much is allocated for heap
 usage? How big is your working data set?

 The reason I ask is that I've seen problems with lots of GC with no room
 gained, and it was memory pressure. Not enough for the heap. We decided
 that just increasing the heap size was a bad idea, as we did rely on free
 RAM being used for filesystem caching. So some vertical and horizontal
 scaling allowed us to give Cass more heap space, as well as distribute