Hey guys,

I just want to get an idea about how everyone avoids these long GC pauses
that cause regionservers to die.

What kind of java heap and garbage collection settings do you use?

What do you do to make sure that the HBase vm never uses swap? I have heard
turning off swap altogether can be dangerous, so right now we have the
setting vm.swappiness=0. How do you tell if it's using swap? On Ganglia, we
see the "CPU wio" metric at around 4.5% before one of our crashes. Is that
high?

To try to avoid using too much memory, is reducing the memstore upper/lower
limit, or the block cache size a good idea? Should we just tune down HBase's
total heap to try to avoid swap?

In terms of our specific problem:

We seem to keep running into garbage collection pauses that cause the
regionservers to die. We have mix of some random read jobs, as well as a few
full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions), and we
are always inserting data. We would rather sacrifice a little speed for
stability, if that means anything. We have 7 nodes (RS + DN + TT) with 12GB
max heap given to HBase, and 24GB memory total.

We were using the following garbage collection options:
-XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
-XX:CMSInitiatingOccupancyFraction=75

After looking at http://wiki.apache.org/hadoop/PerformanceTuning, we are
trying to lower NewSize/MaxNewSize to 6m as well as reducing
CMSInitiatingOccupancyFraction to 50.

We see messages like this in our GC logs:

2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew (promotion
failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637:
[CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark:
8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
 (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340 secs]
10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)],
91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs]

There's a lot of questions there, but I definitely appreciate any advice or
input anybody else has. Thanks so much!

-Sean

Reply via email to