In most cases like can try following: 1. stop hbase 2. clean hbase zookeeper data (hbase zkcli --> rmr /hbase) 3. start hbase
Regards Samir On Fri, May 13, 2016 at 9:17 AM, Gunnar Tapper <[email protected]> wrote: > Hi, > > I'm doing some development testing with Apache Trafodion running > HBase Version 1.0.0-cdh5.4.5. > > All of a sudden, HBase has started to crash. First, it could not be > recovered until I changed hbase_master_distributed_log_splitting to false. > At that point, HBase restarted and sat happily idling for 1 hour. Then, I > started Trafodion letting it sit idling for 1 hour. > > I then started a workload and all RegionServers came crashing down. Looking > at the log files, I suspected ZooKeeper issues so I restarted ZooKeeper and > then HBase. Now, the HMaster fails with: > > 2016-05-13 07:13:52,521 INFO org.apache.hadoop.hbase.master.RegionStates: > Transition {a33adb83f77095913adb4701b01c09a0 state=PENDING_OPEN, > ts=1463123333157, server=ip-172-31-50-109.ec2.internal,60020,1463122925684} > to {a33adb83f77095913adb4701b01c09a0 state=OPENING, ts=1463123632517, > server=ip-172-31-50-109.ec2.internal,60020,1463122925684} > 2016-05-13 07:13:52,527 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil: > master:60000-0x354a8eaea3e007d, > > quorum=ip-172-31-53-252.ec2.internal:2181,ip-172-31-54-241.ec2.internal:2181,ip-172-31-61-36.ec2.internal:2181, > baseZNode=/hbase Unable to list children of znode > /hbase/region-in-transition > java.lang.InterruptedException > at java.lang.Object.wait(Native Method) > at java.lang.Object.wait(Object.java:503) > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342) > at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1466) > at > > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:296) > at > > org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenAndWatchForNewChildren(ZKUtil.java:518) > at > > org.apache.hadoop.hbase.master.AssignmentManager$5.run(AssignmentManager.java:1420) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2016-05-13 07:13:52,527 INFO > org.apache.hadoop.hbase.procedure.flush.MasterFlushTableProcedureManager: > stop: server shutting down. > 2016-05-13 07:13:52,527 INFO org.apache.hadoop.hbase.ipc.RpcServer: > Stopping server on 60000 > 2016-05-13 07:13:52,527 INFO org.apache.hadoop.hbase.ipc.RpcServer: > RpcServer.listener,port=60000: stopping > 2016-05-13 07:13:52,528 INFO org.apache.hadoop.hbase.ipc.RpcServer: > RpcServer.responder: stopped > 2016-05-13 07:13:52,528 INFO org.apache.hadoop.hbase.ipc.RpcServer: > RpcServer.responder: stopping > 2016-05-13 07:13:52,532 ERROR org.apache.zookeeper.ClientCnxn: Error while > calling watcher > java.util.concurrent.RejectedExecutionException: Task > java.util.concurrent.FutureTask@33d4a2bd rejected from > java.util.concurrent.ThreadPoolExecutor@4d0840e0[Terminated, pool size = > 0, > active threads = 0, queued tasks = 0, completed tasks = 38681] > at > > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110) > at > > org.apache.hadoop.hbase.master.AssignmentManager.zkEventWorkersSubmit(AssignmentManager.java:1285) > at > > org.apache.hadoop.hbase.master.AssignmentManager.handleAssignmentEvent(AssignmentManager.java:1479) > at > > org.apache.hadoop.hbase.master.AssignmentManager.nodeDataChanged(AssignmentManager.java:1244) > at > > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:458) > at > > org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) > at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) > 2016-05-13 07:13:52,533 INFO > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Node > /hbase/rs/ip-172-31-50-109.ec2.internal,60000,1463122925543 already > deleted, retry=false > 2016-05-13 07:13:52,534 INFO org.apache.zookeeper.ZooKeeper: Session: > 0x354a8eaea3e007d closed > 2016-05-13 07:13:52,534 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: stopping server > ip-172-31-50-109.ec2.internal,60000,1463122925543; zookeeper connection > closed. > 2016-05-13 07:13:52,534 INFO > org.apache.hadoop.hbase.regionserver.HRegionServer: > master/ip-172-31-50-109.ec2.internal/172.31.50.109:60000 exiting > 2016-05-13 07:13:52,534 INFO org.apache.zookeeper.ClientCnxn: EventThread > shut down > > Suggestions on how to move forward so that I can recover this system? > > -- > Thanks, > > Gunnar > *If you think you can you can, if you think you can't you're right.* >
