Re: One of the regionserver aborted, then the master shut down itself

Jean-Daniel Cryans Tue, 15 Mar 2011 10:41:56 -0700

Inline.

J-D

On Tue, Mar 15, 2011 at 8:32 AM, 茅旭峰 <[email protected]> wrote:
> Thanks J-D for your reply.
>
> It looks like HBASE-3617 will be included in 0.92, then when will 0.92 be
> released?

It should be included in the bug fix release 0.90.2, which isn't
scheduled at the moment. Historically, HBase never had a tight
schedule and releases are made whenever a committer feels like there's
enough fixed jiras and gathers enough votes.

>
> Yes, you're right, we launched tens of threads, putting values of 4MB on
> average, endless.
> Does the region server meant to die because of OOM? I thought it's region
> servers'
> responsibilty to flush memory stores into HFDS, the limitation while doing
> insertion endlessly
> should be the size of HDFS, rather than java heap memory(we set 4GB java
> heap for region
> server).

Yes, the RS does control the MemStores. What it doesn't control very
well is all the queries that are in flight, plus the heap required to
do compactions, plus the data copied when flushing, plus all the other
small tidbits all over the place. Just as an example, every value that
you insert first has to be copied from the socket before it can be
inserted into the MemStore.  If you are using a big write buffer, that
means that every insert currently in flight in a region server takes
double that amount of space.

Garbage collection also isn't done as soon as the objects aren't used,
that wouldn't make sense given how it works, so there's space occupied
by dead objects.

The jira tracking the handling of OOMEs in HBase is
https://issues.apache.org/jira/browse/HBASE-2506

>
> Today, we cleaned up the HDFS, rerun the stress tests, I mean inserting
> endlessly.
> With java memory monitor tools, like jconsole, we find that the java heap of
> master
> is also keeping increasing, another OOM is expected now, though not happened
> so far.
> Is the master meant to die in this regarding?

I think your monitoring is a bit naive, memory isn't cleaned as soon
as it's unused, that's not how the garbage collector works. Your OOME
in the master happens after a region server died because it's trying
to load too much data into memory.

>
> Our keys are SHA1 hashed, which should spread uniformly. But from the web
> page(master:60010),
> we can see most requests are handled only by one region server, and in the
> master log,
> there are lots of region split, and eventually, the regions are spreaded
> uniformly among the region
> servers, is this workflow correct?

That's how it works. There's always one region in the beginning and
then it's split organically. You can create your tables pre-splitted
with this HBaseAdmin method:
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
byte[][])

Or instead of trying to force your data into HBase, you could use the
bulk loader: http://hbase.apache.org/bulk-loads.html

>
> Thanks again for your time, J-D.
>
> Mao Xu-Feng
>

Re: One of the regionserver aborted, then the master shut down itself

Reply via email to