Re: full gc too often

Philo Yang Sun, 07 Dec 2014 08:04:07 -0800

2014-12-05 15:40 GMT+08:00 Jonathan Haddad <j...@jonhaddad.com>:

> I recommend reading through
> https://issues.apache.org/jira/browse/CASSANDRA-8150 to get an idea of
> how the JVM GC works and what you can do to tune it.  Also good is Blake
> Eggleston's writeup which can be found here:
> http://blakeeggleston.com/cassandra-tuning-the-jvm-for-read-heavy-workloads.html
>
> I'd like to note that allocating 4GB heap to Cassandra under any serious
> workload is unlikely to be sufficient.
>
>
Thanks for your recommendation. After reading I try to allocate a larger
heap and it is useful for me. 4G heap can't handle the workload in my use
case indeed.


So another question is, how much pressure dose default max heap (8G) can
handle? The "pressure" may not be a simple qps, you know, slice query for
many columns in a row will allocate more objects in heap than the query for
a single column. Is there any testing result for the relationship between
the "pressure" and the "safety" heap size? We know query a slice with many
tombstones is not a good use case, but query a slice without tombstones may
be a common use case, right?



>
> On Thu Dec 04 2014 at 8:43:38 PM Philo Yang <ud1...@gmail.com> wrote:
>
>> I have two kinds of machine:
>> 16G RAM, with default heap size setting, about 4G.
>> 64G RAM, with default heap size setting, about 8G.
>>
>> These two kinds of nodes have same number of vnodes, and both of them
>> have gc issue, although the node of 16G have a higher probability  of gc
>> issue.
>>
>> Thanks,
>> Philo Yang
>>
>>
>> 2014-12-05 12:34 GMT+08:00 Tim Heckman <t...@pagerduty.com>:
>>
>>> On Dec 4, 2014 8:14 PM, "Philo Yang" <ud1...@gmail.com> wrote:
>>> >
>>> > Hi,all
>>> >
>>> > I have a cluster on C* 2.1.1 and jdk 1.7_u51. I have a trouble with
>>> full gc that sometime there may be one or two nodes full gc more than one
>>> time per minute and over 10 seconds each time, then the node will be
>>> unreachable and the latency of cluster will be increased.
>>> >
>>> > I grep the GCInspector's log, I found when the node is running fine
>>> without gc trouble there are two kinds of gc:
>>> > ParNew GC in less than 300ms which clear the Par Eden Space and
>>> enlarge CMS Old Gen/ Par Survivor Space little (because it only show gc in
>>> more than 200ms, there is only a small number of ParNew GC in log)
>>> > ConcurrentMarkSweep in 4000~8000ms which reduce CMS Old Gen much and
>>> enlarge Par Eden Space little, each 1-2 hours it will be executed once.
>>> >
>>> > However, sometimes ConcurrentMarkSweep will be strange like it shows:
>>> >
>>> > INFO  [Service Thread] 2014-12-05 11:28:44,629 GCInspector.java:142 -
>>> ConcurrentMarkSweep GC in 12648ms.  CMS Old Gen: 3579838424 ->
>>> 3579838464; Par Eden Space: 503316480 -> 294794576; Par Survivor Space:
>>> 62914528 -> 0
>>> > INFO  [Service Thread] 2014-12-05 11:28:59,581 GCInspector.java:142 -
>>> ConcurrentMarkSweep GC in 12227ms.  CMS Old Gen: 3579838464 ->
>>> 3579836512; Par Eden Space: 503316480 -> 310562032; Par Survivor Space:
>>> 62872496 -> 0
>>> > INFO  [Service Thread] 2014-12-05 11:29:14,686 GCInspector.java:142 -
>>> ConcurrentMarkSweep GC in 11538ms.  CMS Old Gen: 3579836688 ->
>>> 3579805792; Par Eden Space: 503316480 -> 332391096; Par Survivor Space:
>>> 62914544 -> 0
>>> > INFO  [Service Thread] 2014-12-05 11:29:29,371 GCInspector.java:142 -
>>> ConcurrentMarkSweep GC in 12180ms.  CMS Old Gen: 3579835784 ->
>>> 3579829760; Par Eden Space: 503316480 -> 351991456; Par Survivor Space:
>>> 62914552 -> 0
>>> > INFO  [Service Thread] 2014-12-05 11:29:45,028 GCInspector.java:142 -
>>> ConcurrentMarkSweep GC in 10574ms.  CMS Old Gen: 3579838112 ->
>>> 3579799752; Par Eden Space: 503316480 -> 366222584; Par Survivor Space:
>>> 62914560 -> 0
>>> > INFO  [Service Thread] 2014-12-05 11:29:59,546 GCInspector.java:142 -
>>> ConcurrentMarkSweep GC in 11594ms.  CMS Old Gen: 3579831424 ->
>>> 3579817392; Par Eden Space: 503316480 -> 388702928; Par Survivor Space:
>>> 62914552 -> 0
>>> > INFO  [Service Thread] 2014-12-05 11:30:14,153 GCInspector.java:142 -
>>> ConcurrentMarkSweep GC in 11463ms.  CMS Old Gen: 3579817392 ->
>>> 3579838424; Par Eden Space: 503316480 -> 408992784; Par Survivor Space:
>>> 62896720 -> 0
>>> > INFO  [Service Thread] 2014-12-05 11:30:25,009 GCInspector.java:142 -
>>> ConcurrentMarkSweep GC in 9576ms.  CMS Old Gen: 3579838424 -> 3579816424;
>>> Par Eden Space: 503316480 -> 438633608; Par Survivor Space: 62914544 -> 0
>>> > INFO  [Service Thread] 2014-12-05 11:30:39,929 GCInspector.java:142 -
>>> ConcurrentMarkSweep GC in 11556ms.  CMS Old Gen: 3579816424 ->
>>> 3579785496; Par Eden Space: 503316480 -> 441354856; Par Survivor Space:
>>> 62889528 -> 0
>>> > INFO  [Service Thread] 2014-12-05 11:30:54,085 GCInspector.java:142 -
>>> ConcurrentMarkSweep GC in 12082ms.  CMS Old Gen: 3579786592 ->
>>> 3579814464; Par Eden Space: 503316480 -> 448782440; Par Survivor Space:
>>> 62914560 -> 0
>>> >
>>> > In each time Old Gen reduce only a little, Survivor Space will be
>>> clear but the heap is still full so there will be another full gc very soon
>>> then the node will down. If I restart the node, it will be fine without gc
>>> trouble.
>>> >
>>> > Can anyone help me to find out where is the problem that full gc can't
>>> reduce CMS Old Gen? Is it because there are too many objects in heap can't
>>> be recycled? I think review the table scheme designing and add new nodes
>>> into cluster is a good idea, but I still want to know if there is any other
>>> reason causing this trouble.
>>>
>>> How much total system memory do you have? How much is allocated for heap
>>> usage? How big is your working data set?
>>>
>>> The reason I ask is that I've seen problems with lots of GC with no room
>>> gained, and it was memory pressure. Not enough for the heap. We decided
>>> that just increasing the heap size was a bad idea, as we did rely on free
>>> RAM being used for filesystem caching. So some vertical and horizontal
>>> scaling allowed us to give Cass more heap space, as well as distribute the
>>> workload to try and avoid further problems.
>>>
>>> > Thanks,
>>> > Philo Yang
>>>
>>> Cheers!
>>> -Tim
>>>
>>
>>

Re: full gc too often

Reply via email to