Re: GC and High CPU

Jack Levin Mon, 16 May 2011 17:06:50 -0700

Those are the lines I added:

-XX:+CMSIncrementalMode \  <--------
-XX:+CMSIncrementalPacing \ <-------
-XX:-TraceClassUnloading <------


-Jack
(used CMS before)

On Mon, May 16, 2011 at 4:19 PM, Stack <[email protected]> wrote:
> So, the change is that you started using CMS?  You were using the
> default GC previous?
>
> ParNew is much bigger now.
>
> St.Ack
>
> On Mon, May 16, 2011 at 4:11 PM, Jack Levin <[email protected]> wrote:
>> I think this will resolve my issue, here is the output:
>>
>>     14 2011-05-16T15:58
>>     13 2011-05-16T15:59
>>     12 2011-05-16T16:00
>>     14 2011-05-16T16:01
>>     14 2011-05-16T16:02
>>     13 2011-05-16T16:03
>>     11 2011-05-16T16:04
>>     12 2011-05-16T16:05
>>     11 2011-05-16T16:06
>> 16:06:55 10.103.7.27 root@mtag27:/usr/lib/hbase/logs $ tail -f gc-hbase.log
>> 2011-05-16T16:06:05.528-0700: 1725.183: [GC 1725.183: [ParNew:
>> 76665K->7644K(76672K), 0.0301560 secs] 3255607K->3194064K(12279488K)
>> icms_dc=0 , 0.0302940 secs] [Times: user=0.09 sys=0.01, real=0.03
>> secs]
>> 2011-05-16T16:06:11.457-0700: 1731.112: [GC 1731.112: [ParNew:
>> 75800K->7168K(76672K), 0.0311670 secs] 3262220K->3198429K(12279488K)
>> icms_dc=0 , 0.0312880 secs] [Times: user=0.09 sys=0.00, real=0.03
>> secs]
>> 2011-05-16T16:06:16.546-0700: 1736.202: [GC 1736.202: [ParNew:
>> 75328K->7674K(76672K), 0.0270760 secs] 3266589K->3203675K(12279488K)
>> icms_dc=0 , 0.0271940 secs] [Times: user=0.09 sys=0.01, real=0.03
>> secs]
>> 2011-05-16T16:06:21.407-0700: 1741.063: [GC 1741.063: [ParNew:
>> 75832K->8512K(76672K), 0.0285340 secs] 3271833K->3209994K(12279488K)
>> icms_dc=0 , 0.0286380 secs] [Times: user=0.07 sys=0.01, real=0.03
>> secs]
>> 2011-05-16T16:06:26.571-0700: 1746.226: [GC 1746.226: [ParNew:
>> 76627K->6049K(76672K), 0.0254590 secs] 3278109K->3213405K(12279488K)
>> icms_dc=0 , 0.0255890 secs] [Times: user=0.08 sys=0.00, real=0.03
>> secs]
>>
>>
>> Now running GC less often with larger cleans. And here is another
>> server that runs under same load:
>>
>>     40 2011-05-16T15:53
>>     50 2011-05-16T15:54
>>     45 2011-05-16T15:55
>>     43 2011-05-16T15:56
>>     39 2011-05-16T15:57
>>     37 2011-05-16T15:58
>>     37 2011-05-16T15:59
>>     35 2011-05-16T16:00
>>     36 2011-05-16T16:01
>>     25 2011-05-16T16:02
>>
>>
>> 16:08:39 10.103.7.25 root@mtag25:/usr/lib/hbase/logs $ tail -f gc-hbase.log
>> 2011-05-16T16:08:29.154-0700: 171956.381: [GC 171956.382: [ParNew:
>> 19130K->2112K(19136K), 0.0403890 secs] 6649843K->6634911K(11885508K),
>> 0.0407000 secs] [Times: user=0.15 sys=0.01, real=0.04 secs]
>> 2011-05-16T16:08:30.237-0700: 171957.464: [GC 171957.465: [ParNew:
>> 19077K->2112K(19136K), 0.0367750 secs] 6651877K->6636722K(11885508K),
>> 0.0369370 secs] [Times: user=0.14 sys=0.00, real=0.04 secs]
>> 2011-05-16T16:08:31.864-0700: 171959.092: [GC 171959.092: [ParNew:
>> 19120K->1656K(19136K), 0.0396040 secs] 6653731K->6637891K(11885508K),
>> 0.0397500 secs] [Times: user=0.14 sys=0.01, real=0.04 secs]
>> 2011-05-16T16:08:33.551-0700: 171960.778: [GC 171960.779: [ParNew:
>> 18620K->2112K(19136K), 0.0398640 secs] 6654855K->6638900K(11885508K),
>> 0.0400910 secs] [Times: user=0.15 sys=0.00, real=0.04 secs]
>> 2011-05-16T16:08:35.116-0700: 171962.344: [GC 171962.344: [ParNew:
>> 19136K->2112K(19136K), 0.0375530 secs] 6655924K->6641070K(11885508K),
>> 0.0377090 secs] [Times: user=0.14 sys=0.01, real=0.04 secs]
>> 2011-05-16T16:08:36.562-0700: 171963.790: [GC 171963.790: [ParNew:
>> 19131K->1763K(19136K), 0.0453660 secs] 6658090K->6642325K(11885508K),
>> 0.0455130 secs] [Times: user=0.17 sys=0.00, real=0.04 secs]
>> 2011-05-16T16:08:37.675-0700: 171964.903: [GC 171964.903: [ParNew:
>> 18733K->1400K(19136K), 0.0457320 secs] 6659295K->6643257K(11885508K),
>> 0.0459240 secs] [Times: user=0.17 sys=0.00, real=0.05 secs]
>>
>>
>> Way more often, less to clean, and CPU usage is visibly higher.  And
>> GC config changes are:
>>
>>
>> export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC
>> -XX:MaxDirectMemorySize=2G"
>>
>> # Uncomment below to enable java garbage collection logging.
>> export HBASE_OPTS="$HBASE_OPTS -verbose:gc -Xms12000m
>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
>> -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError
>> -Xloggc:$HBASE_HOME/logs/gc-hbase.log \
>> -XX:+CMSIncrementalMode \  <--------
>> -XX:+CMSIncrementalPacing \ <-------
>> -XX:-TraceClassUnloading <---------------
>>
>> This way GC statistically adapts, and lesses the load on CPU.
>>
>> -Jack
>>
>>
>> On Mon, May 16, 2011 at 3:17 PM, Jack Levin <[email protected]> wrote:
>>> hlog rolled, and qps was at 1000, not much at all.  I tend to believe
>>> that concurrent GC collection is taking way too many threads from the
>>> app layer and causing CPU to run up.  Setting up incremental mode
>>> should help, I will do so, and report what I find.
>>>
>>> -Jack
>>>
>>> On Mon, May 16, 2011 at 1:48 PM, Stack <[email protected]> wrote:
>>>> On Mon, May 16, 2011 at 1:36 PM, Jack Levin <[email protected]> wrote:
>>>>> How do you tell?  This is the log entries when we had 100% cpu:
>>>>>
>>>>> 2011-05-14T15:48:58.240-0700: 5128.407: [GC 5128.407: [ParNew:
>>>>> 17723K->780K(19136K), 0.0199350 secs] 4309804K->4292973K(5777060K),
>>>>> 0.0200660 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
>>>>
>>>> This doesn't look bad.  Its going from 17.8MB to 0.8MB in 0.02 of real
>>>> time.   Looks like you new gen. is 19M at the moment which seems fine.
>>>>  You have set -Xmx == -Xms in your head because it would seem to say
>>>> you are using 4.3G only.
>>>>
>>>> Looking at the rest of your log, we seem to be doing fine too; we're
>>>> just cleaning out loads of garbage and we're doing it in young gen --
>>>> which is usually what you want -- and we're doing it pretty quick.
>>>>
>>>> Whats going on on your servers at the time?  You said you took off the
>>>> request load but maybe there is a bunch of work queued or we're
>>>> running lots of compactions at the time?
>>>>
>>>>
>>>> St.Ack
>>>>
>>>>
>>>>> 2011-05-14T15:48:58.349-0700: 5128.515: [GC 5128.515: [ParNew:
>>>>> 17804K->886K(19136K), 0.0188930 secs] 4309997K->4293119K(5777060K),
>>>>> 0.0189990 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
>>>>> 2011-05-14T15:48:58.456-0700: 5128.623: [GC 5128.623: [ParNew:
>>>>> 17910K->900K(19136K), 0.0144800 secs] 4310143K->4293237K(5777060K),
>>>>> 0.0145890 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]
>>>>> 2011-05-14T15:48:58.563-0700: 5128.729: [GC 5128.729: [ParNew:
>>>>> 17924K->945K(19136K), 0.0144940 secs] 4310261K->4293469K(5777060K),
>>>>> 0.0146210 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
>>>>> 2011-05-14T15:48:58.668-0700: 5128.835: [GC 5128.835: [ParNew:
>>>>> 17969K->731K(19136K), 0.0149060 secs] 4310493K->4293412K(5777060K),
>>>>> 0.0150210 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
>>>>> 2011-05-14T15:48:58.773-0700: 5128.940: [GC 5128.940: [ParNew:
>>>>> 17755K->906K(19136K), 0.0248650 secs] 4310436K->4293724K(5777060K),
>>>>> 0.0249820 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]
>>>>> 2011-05-14T15:48:58.891-0700: 5129.057: [GC 5129.057: [ParNew:
>>>>> 17930K->1031K(19136K), 0.0194370 secs] 4310748K->4293961K(5777060K),
>>>>> 0.0196490 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
>>>>> 2011-05-14T15:48:59.004-0700: 5129.171: [GC 5129.171: [ParNew:
>>>>> 18055K->960K(19136K), 0.0145450 secs] 4310985K->4293993K(5777060K),
>>>>> 0.0146530 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]
>>>>> 2011-05-14T15:48:59.115-0700: 5129.281: [GC 5129.282: [ParNew:
>>>>> 17984K->1098K(19136K), 0.0143740 secs] 4311017K->4294296K(5777060K),
>>>>> 0.0144780 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]
>>>>> 2011-05-14T15:48:59.224-0700: 5129.391: [GC 5129.391: [ParNew:
>>>>> 18122K->867K(19136K), 0.0150360 secs] 4311320K->4294248K(5777060K),
>>>>> 0.0151500 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
>>>>> 2011-05-14T15:48:59.331-0700: 5129.498: [GC 5129.498: [ParNew:
>>>>> 17891K->1011K(19136K), 0.0220280 secs] 4311272K->4294544K(5777060K),
>>>>> 0.0221320 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
>>>>> 2011-05-14T15:48:59.441-0700: 5129.608: [GC 5129.608: [ParNew:
>>>>> 18035K->1064K(19136K), 0.0164780 secs] 4311568K->4294803K(5777060K),
>>>>> 0.0165820 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
>>>>> 2011-05-14T15:48:59.548-0700: 5129.714: [GC 5129.715: [ParNew:
>>>>> 18088K->1193K(19136K), 0.0141640 secs] 4311827K->4295008K(5777060K),
>>>>> 0.0142730 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
>>>>> 2011-05-14T15:48:59.653-0700: 5129.820: [GC 5129.820: [ParNew:
>>>>> 18217K->1078K(19136K), 0.0150800 secs] 4312032K->4295035K(5777060K),
>>>>> 0.0151780 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
>>>>> 2011-05-14T15:48:59.754-0700: 5129.921: [GC 5129.921: [ParNew:
>>>>> 18102K->906K(19136K), 0.0149160 secs] 4312059K->4295145K(5777060K),
>>>>> 0.0150130 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]
>>>>> 2011-05-14T15:48:59.858-0700: 5130.025: [GC 5130.025: [ParNew:
>>>>> 17930K->749K(19136K), 0.0177820 secs] 4312169K->4295108K(5777060K),
>>>>> 0.0178890 secs] [Times: user=0.07 sys=0.00, real=0.01 secs]
>>>>> 2011-05-14T15:48:59.962-0700: 5130.129: [GC 5130.129: [ParNew:
>>>>> 17773K->970K(19136K), 0.0145400 secs] 4312132K->4295460K(5777060K),
>>>>> 0.0146530 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
>>>>>
>>>>> -Jack
>>>>>
>>>>> On Mon, May 16, 2011 at 12:44 PM, Stack <[email protected]> wrote:
>>>>>> What is the size of your new gen?  Is it growing?  Does it level off?
>>>>>> St.Ack
>>>>>>
>>>>>> On Mon, May 16, 2011 at 12:39 PM, Jack Levin <[email protected]> wrote:
>>>>>>> There HEAP would be 8G used out of 12G total.  The gc-log would be
>>>>>>> full of ParNew, no FULL GC at all, ParNew just starts at hight rate
>>>>>>> (10-15 lines per second).  Even if you remove queries, the JVM will
>>>>>>> not recover, and user CPU will remain 100%.  Only thing left to do is
>>>>>>> to kill the process.  We are not writing in nearly as fast to case
>>>>>>> such an issue.  This looks like a systemic problem, e.g. low enough
>>>>>>> load when it comes to QPS, and the problem effectively kills HBASE.
>>>>>>> There should be a way to tune things so that ParNew does not jump to
>>>>>>> an inappropriate levels.
>>>>>>>
>>>>>>> -Jack
>>>>>>>
>>>>>>> On Mon, May 16, 2011 at 12:02 PM, Stack <[email protected]> wrote:
>>>>>>>> On Sun, May 15, 2011 at 5:37 PM, Jack Levin <[email protected]> wrote:
>>>>>>>>> I've added occupancy:  export HBASE_OPTS="$HBASE_OPTS -verbose:gc
>>>>>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
>>>>>>>>> -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError
>>>>>>>>> -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
>>>>>>>>>
>>>>>>>>
>>>>>>>> Does the big CPU usage correlate to a Full GC?  Do you  notice that in
>>>>>>>> your logs Jack?
>>>>>>>>
>>>>>>>> I took a look at your thread dumps.  Nothing untoward (blocking on 
>>>>>>>> hdfs access).
>>>>>>>>
>>>>>>>> St.Ack
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: GC and High CPU

Reply via email to