On Sat, Aug 14, 2010 at 11:36 AM, Stack <[email protected]> wrote:

> On Sat, Aug 14, 2010 at 1:26 AM, Sean Bigdatafun
> <[email protected]> wrote:
> > On Tue, Aug 10, 2010 at 3:40 PM, Stack <[email protected]> wrote:
> >
> >> OOME may manifest in one place but be caused by some other behavior
> >> altogether.  Its an Error.  You can't tell for sure what damage its
> >> done to the running process (Though, in your stack trace, an OOME
> >> during the array copy could likely be because of very large cells).
> >> Rather than let the damaged server continue, HBase is conservative and
> >> shuts itself down to minimize possible dataloss whenever it gets an
> >> OOME (It has kept aside an emergency memory supply that it releases on
> >> OOME so the shutdown can 'complete' successfully).
> >>
> > I understand that your above saying meant that HBase shut down the
> service
> > for data protection. But can't HBase avoid OOME at the first place? Or
> the
> > OOME situation is a pending bug in HBase?
> >
> > It sounds that HBase can give OOME whenever it under heavy loads -- I
> recall
> > that several people reporting OOME for unknown reasons.
> >
>
> There is always a reason for an OOME.
>
> In our experience, the only remaining cause of OOME in hbase is
> because clients are trying to load up many megabyte cells concurrently
> or they are using large client write buffers so big payloads are being
> passed to the server in each RPC request.  Our RPC is not streaming.
> It passes byte arrays.  If lots of handlers in the server and all are
> being passed big payloads, then its possible that at that moment the
> server heap is overwhelmed.
>
> Is this your case?
>
> If you need help diagnosing, let us help.  When hbase OOME's, it dumps
> the heap.  Put it somewhere we can pull it.
>
> The server keeps account of heap used except here at the edge where
> RPC is taking in requests.
>
> The fix is a little awkward but we'll get to it.  Meantime,
> workarounds are up server heap or cut the number of handlers or use
> smaller client write buffer or don't try loading cells > 10MB or so --
> use HDFS direct and keep location in hbase (hbase not suited to
> carrying large stuff in cells).
>
> Who are the 'several people' reporting OOMEs?  I see this week Ted Yu
> talking of an OOME.  It looks like evidence for large cells in his
> case  so hypothesis outlined above would seem to hold for his case.
>
>
> >>
> >> Are you doing large multiputs?  Do you have lots of handlers running?
> >> If the multiputs are held up because things are running slow, memory
> >> used out on the handlers could throw you over especially if your heap
> >> is small.
> >>
> >> What size heap are you running with?
> >>
> >
>
> You don't answer my questions above.
>
I ran 6GB now, but I really wonder why I should not run 12GB. (People said
if I have too much heapsize, GC will kill me)


>
>
> > By the way, can someone talk about the optimal heap size? Say, I have
> 16GB
> > in my box, and I use 2GB for my DataNode/TaskTracker etc. Presumably, I'd
> > like to set up my RS heapsize >=12GB to cache as much data in memory as
> > possible. But I heard people saying that too much heap size will cause GC
> > pause issue.
> >
>
> 4-8G is what fellas normally run with.
>
> > Can someone give a detailed analysis for what I should do?
> >
>
> What you need beyond the above?
>
It would be really helpful if we can get a detail suggestion on the heapsize
configuration. Theoretically, a) the more heap size I configure, the more
data I can hold in memory; b) but does GC pause insanely long on a large
heap size JVM?
If the answer to b) is yes, then I see two factors that drags HBase users to
opposite directions. And we'd better to know the tradeoff, a balance point
-- if we know the underhood mechanisms, we will be able to configure our
system judiciously, which is better than use other's already successfully
used setup number.

Thanks,
Sean


> St.Ack
>

Reply via email to