Re: Error "Primary master encountered unexpected exception while trying to recover from ZooKeeper session expiry"

Ted Yu Thu, 16 Jul 2015 07:48:20 -0700

Can you check corresponding region server to see if the server was
operating correctly ?


I went over some previous threads where some region server was using wrong
zookeeper quorum.

Cheers

On Thu, Jul 16, 2015 at 7:35 AM, dgoldenberg123 <[email protected]>
wrote:

> Could someone elaborate on what this error means?
>
> We run into a periodic shutdown of HBase (0.98.9 built for Hadoop 2) while
> inserting records into it under load and the stack trace below appears to
> be
> reflective of the cause.
>
> Looking at HMaster.java, what does this error imply and are there ways to
> fix it or work around it?
>
>   private boolean abortNow(final String msg, final Throwable t) {
>     if (!this.isActiveMaster) {
>       return true;
>     }
>     if (t != null && t instanceof KeeperException.SessionExpiredException)
> {
>       try {
>         LOG.info("Primary Master trying to recover from ZooKeeper session "
> +
>             "expiry.");
>         return !tryRecoveringExpiredZKSession();
>       } catch (Throwable newT) {
>         LOG.error("Primary master encountered unexpected exception while "
> +
>             "trying to recover from ZooKeeper session" +
>             " expiry. Proceeding with server abort.", newT);
>       }
>     }
>     return true;
>   }
>
>
> Is https://issues.apache.org/jira/browse/HBASE-4479 related at all (marked
> fixed as of 0.92.0)?
>
> Any insight would be greatly appreciated.
>
> ERROR main-EventThread master.HMaster: Primary master encountered
> unexpected
> exception while trying to recover from ZooKeeper session expiry. Proceeding
> with server abort.
> java.util.concurrent.ExecutionException: java.io.IOException: error or
> interrupted while splitting logs in
> hdfs://
> acme-server.com:9000/tmp/hbase-root/hbase/WALs/acme-server,60088,1436822380393-splitting
> Task = installed
> = 1 done = 0 error = 1
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at
>
> org.apache.hadoop.hbase.master.HMaster.tryRecoveringExpiredZKSession(HMaster.java:2498)
> at org.apache.hadoop.hbase.master.HMaster.abortNow(HMaster.java:2526)
> at org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:2431)
> at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:403)
> at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:321)
> at
>
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> Caused by: java.io.IOException: error or interrupted while splitting logs
> in
> hdfs://
> acme-server.acme.com:9000/tmp/hbase-root/hbase/WALs/acme-server,60088,1436822380393-splitting
> Task = installed = 1 done = 0 error = 1
> at
>
> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:359)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:416)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:308)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitMetaLog(MasterFileSystem.java:299)
> at
>
> org.apache.hadoop.hbase.master.HMaster.splitMetaLogBeforeAssignment(HMaster.java:1178)
> at org.apache.hadoop.hbase.master.HMaster.assignMeta(HMaster.java:1113)
> at
>
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:978)
> at org.apache.hadoop.hbase.master.HMaster.access$300(HMaster.java:286)
> at org.apache.hadoop.hbase.master.HMaster$3.call(HMaster.java:2482)
> at org.apache.hadoop.hbase.master.HMaster$3.call(HMaster.java:2470)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2015-07-14 17:51:54,433 FATAL main-EventThread master.HMaster:
>
> master:57118-0x14e89499bbd0000-0x14e89499bbd0000-0x14e89499bbd0000-0x14e89499bbd0000,
> quorum=localhost:2181, baseZNode=/hbase master:57118-0x14e89499bbd0000-
> 0x14e89499bbd0000-0x14e89499bbd0000-0x14e89499bbd0000 received expired from
> ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
> at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:403)
> at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:321)
> at
>
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
> at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
> 2015-07-14 17:51:54,433 INFO main-EventThread master.HMaster: Aborting
> 2015-07-14 17:51:54,433 INFO main-EventThread zookeeper.ClientCnxn:
> EventThread shut down
> 2015-07-14 17:51:54,434 INFO acme-server,57118,1436822379834-BalancerChore
> balancer.BalancerChore: acme-server,57118,1436822379834-BalancerChore
> exiting
> 2015-07-14 17:51:54,435 INFO
> acme-server,57118,1436822379834-ClusterStatusChore
> balancer.ClusterStatusChore:
> acme-server,57118,1436822379834-ClusterStatusChore exiting
>
>
>
> --
> View this message in context:
> http://apache-hbase.679495.n3.nabble.com/Error-Primary-master-encountered-unexpected-exception-while-trying-to-recover-from-ZooKeeper-session-tp4073279.html
> Sent from the HBase User mailing list archive at Nabble.com.
>

Re: Error "Primary master encountered unexpected exception while trying to recover from ZooKeeper session expiry"

Reply via email to