On Sat, Aug 14, 2010 at 11:36 AM, Stack <[email protected]> wrote: > On Sat, Aug 14, 2010 at 1:26 AM, Sean Bigdatafun > <[email protected]> wrote: > > On Tue, Aug 10, 2010 at 3:40 PM, Stack <[email protected]> wrote: > > > >> OOME may manifest in one place but be caused by some other behavior > >> altogether. Its an Error. You can't tell for sure what damage its > >> done to the running process (Though, in your stack trace, an OOME > >> during the array copy could likely be because of very large cells). > >> Rather than let the damaged server continue, HBase is conservative and > >> shuts itself down to minimize possible dataloss whenever it gets an > >> OOME (It has kept aside an emergency memory supply that it releases on > >> OOME so the shutdown can 'complete' successfully). > >> > > I understand that your above saying meant that HBase shut down the > service > > for data protection. But can't HBase avoid OOME at the first place? Or > the > > OOME situation is a pending bug in HBase? > > > > It sounds that HBase can give OOME whenever it under heavy loads -- I > recall > > that several people reporting OOME for unknown reasons. > > > > There is always a reason for an OOME. > > In our experience, the only remaining cause of OOME in hbase is > because clients are trying to load up many megabyte cells concurrently > or they are using large client write buffers so big payloads are being > passed to the server in each RPC request. Our RPC is not streaming. > It passes byte arrays. If lots of handlers in the server and all are > being passed big payloads, then its possible that at that moment the > server heap is overwhelmed. > > Is this your case? > > If you need help diagnosing, let us help. When hbase OOME's, it dumps > the heap. Put it somewhere we can pull it. > > The server keeps account of heap used except here at the edge where > RPC is taking in requests. > > The fix is a little awkward but we'll get to it. Meantime, > workarounds are up server heap or cut the number of handlers or use > smaller client write buffer or don't try loading cells > 10MB or so -- > use HDFS direct and keep location in hbase (hbase not suited to > carrying large stuff in cells). > > Who are the 'several people' reporting OOMEs? I see this week Ted Yu > talking of an OOME. It looks like evidence for large cells in his > case so hypothesis outlined above would seem to hold for his case. > > > >> > >> Are you doing large multiputs? Do you have lots of handlers running? > >> If the multiputs are held up because things are running slow, memory > >> used out on the handlers could throw you over especially if your heap > >> is small. > >> > >> What size heap are you running with? > >> > > > > You don't answer my questions above. > I ran 6GB now, but I really wonder why I should not run 12GB. (People said if I have too much heapsize, GC will kill me)
> > > > By the way, can someone talk about the optimal heap size? Say, I have > 16GB > > in my box, and I use 2GB for my DataNode/TaskTracker etc. Presumably, I'd > > like to set up my RS heapsize >=12GB to cache as much data in memory as > > possible. But I heard people saying that too much heap size will cause GC > > pause issue. > > > > 4-8G is what fellas normally run with. > > > Can someone give a detailed analysis for what I should do? > > > > What you need beyond the above? > It would be really helpful if we can get a detail suggestion on the heapsize configuration. Theoretically, a) the more heap size I configure, the more data I can hold in memory; b) but does GC pause insanely long on a large heap size JVM? If the answer to b) is yes, then I see two factors that drags HBase users to opposite directions. And we'd better to know the tradeoff, a balance point -- if we know the underhood mechanisms, we will be able to configure our system judiciously, which is better than use other's already successfully used setup number. Thanks, Sean > St.Ack >
