Hello,

   I'm seeing errors like so:

010-08-10 12:58:38,938 DEBUG 
org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher: Got 
ZooKeeper event, state: Disconnected, type: None, path: null
2010-08-10 12:58:38,939 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state: 
Disconnected, type: None, path: null

2010-08-10 12:58:38,941 FATAL 
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError, aborting.
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at 
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:133)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:942)

Then I see:

2010-08-10 12:58:39,408 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 79 on 60020, call close(-2793534857581898004) from 
192.168.195.88:41233: error: java.io.IOException: Server not running, aborting 
java.io.IOException: Server not running, aborting 

And finally:

2010-08-10 12:58:39,514 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Stop requested, clearing 
toDo despite exception
2010-08-10 12:58:39,515 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server 
on 60020
2010-08-10 12:58:39,515 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server 
handler 1 on 60020: exiting

And the server begins to shut down.

Now, it's very likely these are due to retrieving unusually large cells - in 
fact, that's my current assumption.. I'm seeing M/R tasks fail with 
intermittently with the same issue on the read of cell data.

My question is why does this bring the whole regionserver down? I would think 
the regionserver would just fail the Get(), and move on...

Am I misdiagnosing the error? Or is it the case that if I want different 
behavior, I should pony up with some code? :)

Take care,
  -stu



      

Reply via email to