Hello,
I'm seeing errors like so:
010-08-10 12:58:38,938 DEBUG
org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher: Got
ZooKeeper event, state: Disconnected, type: None, path: null
2010-08-10 12:58:38,939 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Got ZooKeeper event, state:
Disconnected, type: None, path: null
2010-08-10 12:58:38,941 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError, aborting.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:133)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:942)
Then I see:
2010-08-10 12:58:39,408 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 79 on 60020, call close(-2793534857581898004) from
192.168.195.88:41233: error: java.io.IOException: Server not running, aborting
java.io.IOException: Server not running, aborting
And finally:
2010-08-10 12:58:39,514 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Stop requested, clearing
toDo despite exception
2010-08-10 12:58:39,515 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server
on 60020
2010-08-10 12:58:39,515 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
handler 1 on 60020: exiting
And the server begins to shut down.
Now, it's very likely these are due to retrieving unusually large cells - in
fact, that's my current assumption.. I'm seeing M/R tasks fail with
intermittently with the same issue on the read of cell data.
My question is why does this bring the whole regionserver down? I would think
the regionserver would just fail the Get(), and move on...
Am I misdiagnosing the error? Or is it the case that if I want different
behavior, I should pony up with some code? :)
Take care,
-stu