Re: Avoiding OutOfMemory Java heap space in region servers

Stack Tue, 10 Aug 2010 15:41:24 -0700

OOME may manifest in one place but be caused by some other behavior
altogether.  Its an Error.  You can't tell for sure what damage its
done to the running process (Though, in your stack trace, an OOME
during the array copy could likely be because of very large cells).
Rather than let the damaged server continue, HBase is conservative and
shuts itself down to minimize possible dataloss whenever it gets an
OOME (It has kept aside an emergency memory supply that it releases on
OOME so the shutdown can 'complete' successfully).


Are you doing large multiputs?  Do you have lots of handlers running?
If the multiputs are held up because things are running slow, memory
used out on the handlers could throw you over especially if your heap
is small.

What size heap are you running with?

St.Ack



On Tue, Aug 10, 2010 at 3:26 PM, Stuart Smith <[email protected]> wrote:
> Hello,
>
>   I'm seeing errors like so:
>
> 010-08-10 12:58:38,938 DEBUG 
> org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher: Got 
> ZooKeeper event, state: Disconnected, type: None, path: null
> 2010-08-10 12:58:38,939 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, 
> state: Disconnected, type: None, path: null
>
> 2010-08-10 12:58:38,941 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError, 
> aborting.
> java.lang.OutOfMemoryError: Java heap space
>        at java.util.Arrays.copyOf(Arrays.java:2786)
>        at 
> java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:133)
>        at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:942)
>
> Then I see:
>
> 2010-08-10 12:58:39,408 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
> handler 79 on 60020, call close(-2793534857581898004) from 
> 192.168.195.88:41233: error: java.io.IOException: Server not running, aborting
> java.io.IOException: Server not running, aborting
>
> And finally:
>
> 2010-08-10 12:58:39,514 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: Stop requested, clearing 
> toDo despite exception
> 2010-08-10 12:58:39,515 INFO org.apache.hadoop.ipc.HBaseServer: Stopping 
> server on 60020
> 2010-08-10 12:58:39,515 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
> handler 1 on 60020: exiting
>
> And the server begins to shut down.
>
> Now, it's very likely these are due to retrieving unusually large cells - in 
> fact, that's my current assumption.. I'm seeing M/R tasks fail with 
> intermittently with the same issue on the read of cell data.
>
> My question is why does this bring the whole regionserver down? I would think 
> the regionserver would just fail the Get(), and move on...
>
> Am I misdiagnosing the error? Or is it the case that if I want different 
> behavior, I should pony up with some code? :)
>
> Take care,
>  -stu
>
>
>
>
>

Re: Avoiding OutOfMemory Java heap space in region servers

Reply via email to