Hello, I am using hbase cdh version 0.98.6. I am facing a problem where a disk controller fails on a host and all disk operation kind of hang up on that host. But region server/data node processes doesn’t die and at the same time the zookeeper session keeps alive. Resulting in all requests to that region server failing. Currently, I use zookeeper client to delete the corresponding znode manually to initiate the recovery process. It will take some time to figure out the hardware issue and fix it. Meanwhile, I am looking to find some solution to automate the recovery process.
I came across HBASE-7351. I am wondering if any one has used this feature or if any other option is available to kill a region server in similar partial hardware failures case. Any insight would be very helpful to me. Thanks - Arun.
