Re: HBase 0.90.3 OOM at 1.5G heap

Henning Blohm Tue, 12 Jul 2011 01:02:08 -0700

Good morning St.Ack,

the schema consists of one table and one column family, holding fivecolumns with one string (<20 chars) and four double numbers (ratherminimal really).

The load test runs in 24 concurrent mappers, each writing 500k rows,2000 runs in total.


WAL is turned on.

And yes, it took down to region servers and the processes wereeventually gone. From the logs however it looked as if the regionservers still tried to continue for a while after the first OOM.

They didn't get restarted and I had the impression the HMaster didn'trespond to web requests either (but I shut it down quickly to restartthe whole cluster - so not sure about that).

My hbase-env.sh is out-of-the-box except for the heap settings. So theGC config is

-XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC-XX:+CMSIncrementalMode


is that too little aggressive?

hbase-site.xml is also standard, except for the cluster config (i.e. thezookeeper quorom config etc).



Just noticed that there is a gc log. I will look into that as well.

Currently retrying with 2G heap.

Thanks,
  Henning

On 07/11/2011 06:24 PM, Stack wrote:

On Mon, Jul 11, 2011 at 1:04 AM, Henning Blohm<[email protected]>  wrote:

I am running HBASE 0.90.3 (just upgraded for testing). It is configured for
1.5G heap, which seemed to be a good setting for HBASE 0.20.6. When running
a stress test that would write into three HBASE data nodes from 24 processes
with the goal of inserting one billion simple rows, I get an OOMs at two of
three region servers after about 75% of the work is done.

Whats your schema?  Whats the size of your cells?  0.90 is different
to 0.20.  1.5G is little memory but HBase should just work w/ 1G or
more of heap.

Here is the first OOM:

2011-07-09 23:34:40,988 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Applied 924, skipped 1105, firstSequenceidInLog=162957072,
maxSequenceidInLog=163841413

This looks like you are crashing regionservers.  Is that so?  Whats
your current GC config?

Now:

1. Is there any way to configure some stable heap size? Where is the leak?
This is really frustrating (it took a while to figure out 1.5G was "somehow
good" for 0.20.6)

Start big.  Give it 8Gs?  See how it does then.

How many handlers are you running with?

2. Wouldn't it make sense to let the region server die at the first OOM and
have it restarted quickly rather then letting it go on in some likely broken
state after the OOM until it eventually dies anyway?

Don't we do this currently?  Only time this does not happen is when
the OOME happens out at extremities in RPC which we do not directly
control (We should fix that).  It catches OOME and then tries to keep
going.  Otherwise, if OOME, we'll release resevoir of memory that
we've been holding back so we can shut ourselves down.

St.Ack



--

*Henning Blohm*

*ZFabrik Software KG*

T:      +49/62278399955
F:      +49/62278399956
M:      +49/1781891820

Bunsenstrasse 1
69190 Walldorf

[email protected] <mailto:[email protected]>
Linkedin <http://de.linkedin.com/pub/henning-blohm/0/7b5/628>
www.zfabrik.de <http://www.zfabrik.de>
www.z2-environment.eu <http://www.z2-environment.eu>

Re: HBase 0.90.3 OOM at 1.5G heap

Reply via email to