Hi!

This morning, on our production system, we experienced a very bad behavior of 
HBase 0.20.6.

1- one of our region server crash
2- we restarted it with success (no error on the master nor on the region 
servers)
3- but we discovered that our HBase clients were enable to recover for this 
situation:

Each time a get() was performed, but ONLY ON THE BIGGEST TABLES, our HBase 
clients triggered an exception (actually coming fro the restarted region 
server):
org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: appInfo.aki519368.prod.capptain.com,801765cd68dcbfc04690770622c2edaa,1307369888185
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2269)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1732)
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
        at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

More strange:

3- only client READING HBase triggered this exception: client writing to HBase 
recovered without any error from this failure (and the writes were effectively 
performed)

To fix this, we had to restart all our HBase clients reading from the BIGGEST 
TABLES. So we guess that the issue come from the HBase client library or the 
region server itself.

We reproduce this bug easily on our development servers: we kill a region server, we 
restart it and clients trying to "get" from regions served by the 
killed/restarted region server get this exception until we restart them.

So my questions are:

Is this a know issue ?
Has it been fixed in HBase 0.90 ?
Is it required to handle this exception in a special way on client side (e.g. 
close / reopen the table) ?

Thank a lot

Reply via email to