On Mon, May 16, 2011 at 1:36 PM, Jack Levin <[email protected]> wrote:
> How do you tell?  This is the log entries when we had 100% cpu:
>
> 2011-05-14T15:48:58.240-0700: 5128.407: [GC 5128.407: [ParNew:
> 17723K->780K(19136K), 0.0199350 secs] 4309804K->4292973K(5777060K),
> 0.0200660 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]

This doesn't look bad.  Its going from 17.8MB to 0.8MB in 0.02 of real
time.   Looks like you new gen. is 19M at the moment which seems fine.
 You have set -Xmx == -Xms in your head because it would seem to say
you are using 4.3G only.

Looking at the rest of your log, we seem to be doing fine too; we're
just cleaning out loads of garbage and we're doing it in young gen --
which is usually what you want -- and we're doing it pretty quick.

Whats going on on your servers at the time?  You said you took off the
request load but maybe there is a bunch of work queued or we're
running lots of compactions at the time?


St.Ack


> 2011-05-14T15:48:58.349-0700: 5128.515: [GC 5128.515: [ParNew:
> 17804K->886K(19136K), 0.0188930 secs] 4309997K->4293119K(5777060K),
> 0.0189990 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
> 2011-05-14T15:48:58.456-0700: 5128.623: [GC 5128.623: [ParNew:
> 17910K->900K(19136K), 0.0144800 secs] 4310143K->4293237K(5777060K),
> 0.0145890 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]
> 2011-05-14T15:48:58.563-0700: 5128.729: [GC 5128.729: [ParNew:
> 17924K->945K(19136K), 0.0144940 secs] 4310261K->4293469K(5777060K),
> 0.0146210 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
> 2011-05-14T15:48:58.668-0700: 5128.835: [GC 5128.835: [ParNew:
> 17969K->731K(19136K), 0.0149060 secs] 4310493K->4293412K(5777060K),
> 0.0150210 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
> 2011-05-14T15:48:58.773-0700: 5128.940: [GC 5128.940: [ParNew:
> 17755K->906K(19136K), 0.0248650 secs] 4310436K->4293724K(5777060K),
> 0.0249820 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]
> 2011-05-14T15:48:58.891-0700: 5129.057: [GC 5129.057: [ParNew:
> 17930K->1031K(19136K), 0.0194370 secs] 4310748K->4293961K(5777060K),
> 0.0196490 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
> 2011-05-14T15:48:59.004-0700: 5129.171: [GC 5129.171: [ParNew:
> 18055K->960K(19136K), 0.0145450 secs] 4310985K->4293993K(5777060K),
> 0.0146530 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]
> 2011-05-14T15:48:59.115-0700: 5129.281: [GC 5129.282: [ParNew:
> 17984K->1098K(19136K), 0.0143740 secs] 4311017K->4294296K(5777060K),
> 0.0144780 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]
> 2011-05-14T15:48:59.224-0700: 5129.391: [GC 5129.391: [ParNew:
> 18122K->867K(19136K), 0.0150360 secs] 4311320K->4294248K(5777060K),
> 0.0151500 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
> 2011-05-14T15:48:59.331-0700: 5129.498: [GC 5129.498: [ParNew:
> 17891K->1011K(19136K), 0.0220280 secs] 4311272K->4294544K(5777060K),
> 0.0221320 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
> 2011-05-14T15:48:59.441-0700: 5129.608: [GC 5129.608: [ParNew:
> 18035K->1064K(19136K), 0.0164780 secs] 4311568K->4294803K(5777060K),
> 0.0165820 secs] [Times: user=0.07 sys=0.00, real=0.02 secs]
> 2011-05-14T15:48:59.548-0700: 5129.714: [GC 5129.715: [ParNew:
> 18088K->1193K(19136K), 0.0141640 secs] 4311827K->4295008K(5777060K),
> 0.0142730 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
> 2011-05-14T15:48:59.653-0700: 5129.820: [GC 5129.820: [ParNew:
> 18217K->1078K(19136K), 0.0150800 secs] 4312032K->4295035K(5777060K),
> 0.0151780 secs] [Times: user=0.06 sys=0.00, real=0.02 secs]
> 2011-05-14T15:48:59.754-0700: 5129.921: [GC 5129.921: [ParNew:
> 18102K->906K(19136K), 0.0149160 secs] 4312059K->4295145K(5777060K),
> 0.0150130 secs] [Times: user=0.05 sys=0.00, real=0.02 secs]
> 2011-05-14T15:48:59.858-0700: 5130.025: [GC 5130.025: [ParNew:
> 17930K->749K(19136K), 0.0177820 secs] 4312169K->4295108K(5777060K),
> 0.0178890 secs] [Times: user=0.07 sys=0.00, real=0.01 secs]
> 2011-05-14T15:48:59.962-0700: 5130.129: [GC 5130.129: [ParNew:
> 17773K->970K(19136K), 0.0145400 secs] 4312132K->4295460K(5777060K),
> 0.0146530 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
>
> -Jack
>
> On Mon, May 16, 2011 at 12:44 PM, Stack <[email protected]> wrote:
>> What is the size of your new gen?  Is it growing?  Does it level off?
>> St.Ack
>>
>> On Mon, May 16, 2011 at 12:39 PM, Jack Levin <[email protected]> wrote:
>>> There HEAP would be 8G used out of 12G total.  The gc-log would be
>>> full of ParNew, no FULL GC at all, ParNew just starts at hight rate
>>> (10-15 lines per second).  Even if you remove queries, the JVM will
>>> not recover, and user CPU will remain 100%.  Only thing left to do is
>>> to kill the process.  We are not writing in nearly as fast to case
>>> such an issue.  This looks like a systemic problem, e.g. low enough
>>> load when it comes to QPS, and the problem effectively kills HBASE.
>>> There should be a way to tune things so that ParNew does not jump to
>>> an inappropriate levels.
>>>
>>> -Jack
>>>
>>> On Mon, May 16, 2011 at 12:02 PM, Stack <[email protected]> wrote:
>>>> On Sun, May 15, 2011 at 5:37 PM, Jack Levin <[email protected]> wrote:
>>>>> I've added occupancy:  export HBASE_OPTS="$HBASE_OPTS -verbose:gc
>>>>> -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails
>>>>> -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError
>>>>> -Xloggc:$HBASE_HOME/logs/gc-hbase.log"
>>>>>
>>>>
>>>> Does the big CPU usage correlate to a Full GC?  Do you  notice that in
>>>> your logs Jack?
>>>>
>>>> I took a look at your thread dumps.  Nothing untoward (blocking on hdfs 
>>>> access).
>>>>
>>>> St.Ack
>>>>
>>>
>>
>

Reply via email to