HBase consistency issues (holes) and long startup

Harry Waye Fri, 27 May 2016 09:40:43 -0700

We had a regionserver fall out of our cluster, I assume due to the process
hitting a limit as the region servers .out log file just contained "Killed"
which I've experienced when hitting open file descriptors limits.  After
this, hbck then reported inconsistencies in tables:


ERROR: There is a hole in the region chain between
dce998f6f8c63d3515a3207330697ce4-ravi teja and e4.  You need to create a
new .regioninfo and region dir in hdfs to plug the hole.

`hdfs fsck` reports a healthy dfs.

I attempted to run `hbase hbck -repairHoles` which didn't resolve the
inconsistencies.

I then restarted the HBase cluster and it now appears from looking at the
master log files that there are many tasks waiting to complete, and the web
interface results in a timeout:

master.SplitLogManager: total tasks = 299 unassigned = 285 tasks={ ... }

>From looking at the logs on the regionservers I see messages such as:
"regionserver.SplitLogWorker: Current region server ... has 2 tasks in
progress and can't take more".

How can I speed up working through these tasks?  I suspect our nodes can
handle many more that 2 tasks at a time. I'll likely have followup
questions ones these have been worked through but I think that's it for not.

Any other information you need?

HBase consistency issues (holes) and long startup

Reply via email to