Hello,

I am using hbase cdh version 0.98.6. I am facing a problem where a disk 
controller fails on a host and all disk operation kind of hang up on that host. 
But region server/data node processes doesn’t die and at the same time the 
zookeeper session keeps alive. Resulting in all requests to that region server 
failing. Currently, I use zookeeper client to delete the corresponding znode 
manually to initiate the recovery process. It will take some time to figure out 
the hardware issue and fix it. Meanwhile, I am looking to find some solution to 
automate the recovery process. 

I came across HBASE-7351. I am wondering if any one has used this feature or if 
any other option is available to kill a region server in similar partial 
hardware failures case. Any insight would be very helpful to me. 

Thanks - Arun.


Reply via email to