Hi Everyone, Hbase Version: 0.90.3 Hadoop Version: cdh3u0 2 region servers, zookeeper quorum managed by hbase.
I was doing some tests and it seemed regions are not getting reassigned by master if RS is brought down. Here are the steps: 0. Cluster in a steady state. Pick a random key: k1 belonging to a RS: rs1 and perform a get from shell. Result comes back fine. 1. Bring down rs1 using [/usr/lib/hbase-0.20/bin/hbase-daemon.sh --config /usr/lib/hbase-0.20/conf/ stop regionserver] 2. Wait few second and do a get from shell for k1 again. k1 is still being located at rs1 and RetriesExhaustedException occurs. 3. Wait few minutes and do a get from shell for k1 again. k1 is still being located at rs1 and RetriesExhaustedException occurs. 4. Bring up rs1 using [/usr/lib/hbase-0.20/bin/hbase-daemon.sh --config /usr/lib/hbase-0.20/conf/ start regionserver] 5. A get from shell brings back the result just fine. My hope at step (3) was a reassignment of regions and get should have succeeded. 0.90.2 has introduced process to do things more gracefully which is great, but that (graceful shutdown) is not always possible. I have pastebin-ed the relevant logs. Can anyone help me understand the scenario? Hbase Shell after RS brought down http://pastebin.com/8bvk5RFV RS log around time it was brought down http://pastebin.com/sgVRVCCj Zkdump after RS brought down http://pastebin.com/meyqCVJ0 Hmaster log around time RS was brought down http://pastebin.com/jBGKuy74 hbck after RS brought down http://pastebin.com/bxvyTTF5 hbck after RS brought up http://pastebin.com/FPxvT9qW
