We had a namenode go down due to timeout with the hdfs ha qjm journal:


2015-12-09 04:10:42,723 WARN
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 19016
ms (timeout=20000 ms) for a response for sendEdits

2015-12-09 04:10:43,708 FATAL
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for
required journal (JournalAndStream(mgr=QJM to [10.42.28.221:8485,
10.42.28.222:8485, 10.42.28.223:8485], stream=QuorumOutputStream starting
at txid 8781293))

java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to
respond.

at
org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)

at
org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)

at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)

at
org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)

at
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:490)

at
org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:350)

at
org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)

at
org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:486)

at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:581)

at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1695)

at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1669)

at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:409)

at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)

at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)

at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)


While this is disturbing in it's own right, I'm further annoyed that HBASE
shut down  2 region servers. Furthermore, we had to hbck -fixAssignments to
repair HBASE, and I'm not sure that the data from the shutdown regions was
available, and if our hbase service itself was available afterwards:


2015-12-09 04:10:44,320 ERROR org.apache.hadoop.hbase.master.HMaster:
Region server ^@^@hbase008r09.comp.prod.local,60020,1436412712133 reported
a fatal error:

ABORTING region server hbase008r09.comp.prod.local,60020,1436412712133: IOE
in log roller

Cause:

java.io.IOException: cannot get log writer

  at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716)

  at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663)

  at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595)

  at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)

  at java.lang.Thread.run(Thread.java:722)

Caused by: java.io.IOException: java.io.IOException: Failed on local
exception: java.io.IOException: Response is null.; Host Details : local
host is: "hbase008r09.comp.prod.local/10.42.28.192"; destination host is:
"hbasenn001.comp.prod.local":8020;

  at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)

  at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713)

  ... 4 more

Caused by: java.io.IOException: Failed on local exception:
java.io.IOException: Response is null.; Host Details : local host is:
"hbase008r09.comp.prod.local/10.42.28.192"; destination host is:
"hbasenn001.comp.prod.local":8020;

  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)

  at org.apache.hadoop.ipc.Client.call(Client.java:1228)

  at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)

  at com.sun.proxy.$Proxy14.create(Unknown Source)

  at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)

  at sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)

  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

  at java.lang.reflect.Method.invoke(Method.java:601)

  at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)

  at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)

  at com.sun.proxy.$Proxy15.create(Unknown Source)

  at
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)

  at
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)

  at org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1264)

  at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:97)

  at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:53)

  at
org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:554)

  at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:663)

  at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:660)

  at
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)

  at org.apache.hadoop.fs.FileContext.create(FileContext.java:660)

  at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:502)

  at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469)

  at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)

  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

  at java.lang.reflect.Method.invoke(Method.java:601)

  at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)

  ... 5 more

Caused by: java.io.IOException: Response is null.

  at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:940)

  at org.apache.hadoop.ipc.Client$Connection.run(Client.java:835)


2015-12-09 04:10:44,387 ERROR org.apache.hadoop.hbase.master.HMaster:
Region server ^@^@hbase007r08.comp.prod.local,60020,1436412674179 reported
a fatal error:

ABORTING region server hbase007r08.comp.prod.local,60020,1436412674179: IOE
in log roller

Cause:

java.io.IOException: cannot get log writer

  at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716)

  at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663)

  at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595)

  at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)

  at java.lang.Thread.run(Thread.java:722)

Caused by: java.io.IOException: java.io.IOException: Failed on local
exception: java.io.IOException: Response is null.; Host Details : local
host is: "hbase007r08.comp.prod.local/10.42.28.191"; destination host is:
"hbasenn001.comp.prod.local":8020;


  at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)

  at
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713)

  ... 4 more

Caused by: java.io.IOException: Failed on local exception:
java.io.IOException: Response is null.; Host Details : local host is:
"hbase007r08.comp.prod.local/10.42.28.191"; destination host is:
"hbasenn001.comp.prod.local":8020;

  at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)

  at org.apache.hadoop.ipc.Client.call(Client.java:1228)

  at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)

  at com.sun.proxy.$Proxy14.create(Unknown Source)

  at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)

  at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)

  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

  at java.lang.reflect.Method.invoke(Method.java:601)

  at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)

  at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)

  at com.sun.proxy.$Proxy15.create(Unknown Source)

  at
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)

  at
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)

  at org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1264)

  at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:97)

  at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:53)

  at
org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:554)

  at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:663)

  at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:660)

  at
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)

  at org.apache.hadoop.fs.FileContext.create(FileContext.java:660)

  at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:502)

  at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469)

  at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)

  at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

  at java.lang.reflect.Method.invoke(Method.java:601)

  at
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)

  ... 5 more

Caused by: java.io.IOException: Response is null.

  at
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:940)

  at org.apache.hadoop.ipc.Client$Connection.run(Client.java:835)


2015-12-09 04:11:01,444 INFO org.apache.zookeeper.ClientCnxn: Client
session timed out, have not heard from server in 26679ms for sessionid
0x44e6c2f20980003, closing socket connection and attempting reconnect

2015-12-09 04:11:34,636 WARN
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking
getListing of class ClientNamenodeProtocolTranslatorPB. Trying to fail over
immediately.

2015-12-09 04:11:34,687 WARN
org.apache.hadoop.io.retry.RetryInvocationHandler: Exception while invoking
getListing of class ClientNamenodeProtocolTranslatorPB after 1 fail over
attempts. Trying to fail over after sleeping for 791ms.

2015-12-09 04:11:35,334 WARN org.apache.hadoop.ipc.HBaseServer:
(responseTooSlow):
{"processingtimems":50237,"call":"reportRSFatalError([B@3c97e50c, ABORTING
region server hbase008r09.comp.prod.local,60020,1436412712133: IOE in log
roller\nCause:\njava.io.IOException: cannot get log writer\n\tat
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:716)\n\tat
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:663)\n\tat
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:595)\n\tat
org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)\n\tat
java.lang.Thread.run(Thread.java:722)\nCaused by: java.io.IOException:
java.io.IOException: Failed on local exception: java.io.IOException:
Response is null.; Host Details : local host is:
\"hbase008r09.comp.prod.local/10.42.28.192\"; destination host is:
\"hbasenn001.comp.prod.local\":8020; \n\tat
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)\n\tat
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:713)\n\t...
4 more\nCaused by: java.io.IOException: Failed on local exception:
java.io.IOException: Response is null.; Host Details : local host is:
\"hbase008r09.comp.prod.local/10.42.28.192\"; destination host is:
\"hbasenn001.comp.prod.local\":8020; \n\tat
org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:759)\n\tat
org.apache.hadoop.ipc.Client.call(Client.java:1228)\n\tat
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)\n\tat
com.sun.proxy.$Proxy14.create(Unknown Source)\n\tat
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:192)\n\tat
sun.reflect.GeneratedMethodAccessor30.invoke(Unknown Source)\n\tat
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
java.lang.reflect.Method.invoke(Method.java:601)\n\tat
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)\n\tat
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)\n\tat
com.sun.proxy.$Proxy15.create(Unknown Source)\n\tat
org.apache.hadoop.hdfs.DFSOutputStream.<init>(DFSOutputStream.java:1298)\n\tat
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1317)\n\tat
org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1264)\n\tat
org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:97)\n\tat
org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:53)\n\tat
org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:554)\n\tat
org.apache.hadoop.fs.FileContext$3.next(FileContext.java:663)\n\tat
org.apache.hadoop.fs.FileContext$3.next(FileContext.java:660)\n\tat
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2333)\n\tat
org.apache.hadoop.fs.FileContext.create(FileContext.java:660)\n\tat
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:502)\n\tat
org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:469)\n\tat
sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)\n\tat
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat
java.lang.reflect.Method.invoke(Method.java:601)\n\tat
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)\n\t...
5 more\nCaused by: java.io.IOException: Response is null.\n\tat
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:940)\n\tat
org.apache.hadoop.ipc.Client$Connection.run(Client.java:835)\n), rpc
version=1, client version=29, methodsFingerPrint=-525182806","client":"
10.42.28.192:52162
","starttimems":1449659444320,"queuetimems":0,"class":"HMaster","responsesize":0,"method":"reportRSFatalError"}

2015-12-09 04:11:35,409 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server hbase004r08.comp.prod.local/10.42.28.188:2181.
Will not attempt to authenticate using SASL (Unable to locate a login
configuration)

2015-12-09 04:11:35,411 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to hbase004r08.comp.prod.local/10.42.28.188:2181,
initiating session

2015-12-09 04:11:35,413 INFO org.apache.zookeeper.ClientCnxn: Unable to
reconnect to ZooKeeper service, session 0x44e6c2f20980003 has expired,
closing socket connection

2015-12-09 04:11:35,413 FATAL org.apache.hadoop.hbase.master.HMaster:
Master server abort: loaded coprocessors are: []

2015-12-09 04:11:35,414 INFO org.apache.hadoop.hbase.master.HMaster:
Primary Master trying to recover from ZooKeeper session expiry.

2015-12-09 04:11:35,416 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection,
connectString=hbase004r08.comp.prod.local:2181,hbase003r07.comp.prod.local:2181,hbase005r09.comp.prod.local:2181
sessionTimeout=1200000 watcher=master:60000


...


and eventually:


2015-12-09 04:11:46,724 ERROR org.apache.zookeeper.ClientCnxn: Caught
unexpected throwable

2015-12-09 04:11:46,724 ERROR org.apache.zookeeper.ClientCnxn: Caught
unexpected throwable

java.lang.StackOverflowError

  at java.security.AccessController.doPrivileged(Native Method)

  at java.io.PrintWriter.<init>(PrintWriter.java:78)

  at java.io.PrintWriter.<init>(PrintWriter.java:62)

  at
org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:58)

  at
org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)

  at
org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)

  at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:313)

  at
org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276)

  at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)

  at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)

  at
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)

  at org.apache.log4j.Category.callAppenders(Category.java:206)

  at org.apache.log4j.Category.forcedLog(Category.java:391)

  at org.apache.log4j.Category.log(Category.java:856)

  at org.slf4j.impl.Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:576)

  at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:623)

  at
org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)

  at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640)

  at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658)

  at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286)

  at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975)

  at
org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627)

  at
org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96)

  at
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1106)

  at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619)

  at
org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)

  at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640)

  at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658)

  at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286)

  at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975)

  at
org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627)

  at
org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96)

  at
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1106)

  at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619)

  at
org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)

  at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640)

  at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658)

  at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286)

  at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975)

  at
org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627)

  at
org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96)

  at
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1106)

  at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619)

  at
org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)

  at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640)

  at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658)

  at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286)

  at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975)

  at
org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627)

  at
org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96)

  at
org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback.processResult(SplitLogManager.java:1106)

  at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:619)

  at
org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:477)

  at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:640)

  at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:658)

  at org.apache.zookeeper.ClientCnxn.queuePacket(ClientCnxn.java:1286)

  at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:975)

  at
org.apache.hadoop.hbase.master.SplitLogManager.deleteNode(SplitLogManager.java:627)

  at
org.apache.hadoop.hbase.master.SplitLogManager.access$1600(SplitLogManager.java:96)


...


Since the namenode failover made the other nameserver active, then why did
my region servers decide to shutdown? The HDFS service seems to have stayed
up. Then how can I make the HBASE service more resilient to namenode
failovers?


Hbase: Version 0.92.1-cdh4.1.3


Hadoop: Hadoop 2.0.0-cdh4.1.3

Reply via email to