Re: Distributed log splitting failing after cluster outage.

Alok Singh Thu, 06 Mar 2014 11:33:42 -0800

We ran into this a few weeks ago when while adding new nodes into an
existing cluster. Due to a misconfiguration, the new nodes were assigned a
wrong zookeeper quorum, and ended up forming a new cluster.
We saw a similar error in our logs:


2014-01-30 16:47:19,196 ERROR
org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while
processing event M_META_SERVER_SHUTDOWN
java.io.IOException: failed log splitting for
xxxxx.xxx.urbanairship.com,60020,1385165871751, will retry
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:182)
        at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: error or interrupted while splitting
logs in 
[maprfs:/......./xxxx.xxxx.urbanairship.com,60020,1385165871751-splitting]
Task = installed = 1 done = 0 error = 1
        at 
org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:272)
        at 
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:284)
        at 
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:252)
        at 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:175)


We fixed it by shutting the new nodes down, moving aside the offending logs
and restarting the master. Later,we fixed the zooker configuration and then
brought new nodes back into the cluster.

Alok


On Thu, Mar 6, 2014 at 11:13 AM, David Koch <[email protected]> wrote:

> Hello,
>
> Our HBase cluster had an unexpected shut-down and while trying to bring it
> back up we the Master gets stuck with the following message:
>
> Failed splitting of [ list of <host_name>,<port>,<tmst> ]
> java.io.IOException: error or interrupted while splitting logs in [ list of
> <host_name>,<port>,<tmst> ]
> Task = installed = 10 done = 0 error = 10
> at
>
> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:282)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:300)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:242)
> at
>
> org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:661)
> at
>
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:580)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:396)
> at java.lang.Thread.run(Thread.java:724)
>
> What can I do to get the cluster operational again. There was no data
> ingestion going on since quite some hours before the crash so maybe
> clearing out /hbase/.logs/ could be an option.
>
> Thanks,
>
> /David
>

Re: Distributed log splitting failing after cluster outage.

Reply via email to