On Sun, Aug 29, 2010 at 12:37 PM, Nathan Harkenrider
<[email protected]> wrote:
> The HBase master still thinks it is alive, and the node is
> registered in Zookeeper.

Can you get some logs from this RS?  (You can get to the RS logs from
UI if you do not have access).  It would be interesting to see how the
HW issue manifests.  We should recognize it and abort (We do this
already for various scenarios -- OOME, unreachable HDFS).


Clients hitting regions hosted on this particular
> region server are hanging/timing out, which is less than ideal. Any thoughts
> on thoughts on how to configure HBase to be more sensitive to this type of
> error? Also, is there any way short of restarting HBase that I can force
> these regions to be reassigned to another regionserver if I don't have
> physical access (or remote console) to stop the regionserver process on the
> failing node.
>

Well, usually you'd just shutdown that RS and its load would be
distributed across the remainders but you need access to kill the
individula RS (In 0.90 you will be able to do it from HBaseAdmin).

> The master did not report any errors in its log related to the failing node.
> I'm currently waiting on operations to get me the regionserver logs if they
> can be recovered.
>

For sure we'd like to have a looksee.

Thanks,
St.Ack

> Nathan Harkenrider
>

Reply via email to