I have 3xregion servers, 8GB mem each, and running this query via sqlline.py:

select count(*),word from words group by word limit 10;

So far 3 region servers died, the 1st one with no error in the log, the second one with this (some race condition with an other region server? as I have been restarting the 1st crashed server):

2015-09-30 13:26:45,429 INFO [RS_OPEN_REGION-d1:16020-1] coordination.ZkOpenRegionCoordination: Opening of region {ENCODED => e211961cd190cf57f8c5a691bd3f265f, NAME => 'PERFORMANCE_1000,EUSalesforce\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00,1442843941238.e211961cd190cf57f8c5a691bd3f265f.', STARTKEY => 'EUSalesforce\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', ENDKEY => 'NAApple\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'} failed, transitioning from OFFLINE to FAILED_OPEN in ZK, expecting version 2 2015-09-30 13:26:47,786 INFO [regionserver/d1.lan/192.168.0.29:16020.logRoller] regionserver.LogRoller: LogRoller exiting. 2015-09-30 13:26:47,787 INFO [regionserver/d1.lan/192.168.0.29:16020] regionserver.CompactSplitThread: Waiting for Split Thread to finish... 2015-09-30 13:26:47,787 INFO [regionserver/d1.lan/192.168.0.29:16020] regionserver.CompactSplitThread: Waiting for Merge Thread to finish... 2015-09-30 13:26:47,787 INFO [regionserver/d1.lan/192.168.0.29:16020] regionserver.CompactSplitThread: Waiting for Large Compaction Thread to finish... 2015-09-30 13:26:47,787 INFO [regionserver/d1.lan/192.168.0.29:16020] regionserver.CompactSplitThread: Waiting for Small Compaction Thread to finish... 2015-09-30 13:26:48,282 INFO [regionserver/d1.lan/192.168.0.29:16020] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x1501e1145c90002 2015-09-30 13:26:48,299 INFO [regionserver/d1.lan/192.168.0.29:16020] zookeeper.ZooKeeper: Session: 0x1501e1145c90002 closed 2015-09-30 13:26:48,299 INFO [regionserver/d1.lan/192.168.0.29:16020-EventThread] zookeeper.ClientCnxn: EventThread shut down 2015-09-30 13:26:48,300 INFO [regionserver/d1.lan/192.168.0.29:16020] ipc.RpcServer: Stopping server on 16020 2015-09-30 13:26:48,300 INFO [RpcServer.listener,port=16020] ipc.RpcServer: RpcServer.listener,port=16020: stopping 2015-09-30 13:26:48,301 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped 2015-09-30 13:26:48,335 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping 2015-09-30 13:26:48,387 INFO [regionserver/d1.lan/192.168.0.29:16020] zookeeper.ZooKeeper: Session: 0x1501e1145c90000 closed 2015-09-30 13:26:48,387 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down 2015-09-30 13:26:48,387 INFO [regionserver/d1.lan/192.168.0.29:16020] regionserver.HRegionServer: stopping server d1.lan,16020,1443613463226; zookeeper connection closed. 2015-09-30 13:26:48,387 INFO [regionserver/d1.lan/192.168.0.29:16020] regionserver.HRegionServer: regionserver/d1.lan/192.168.0.29:16020 exiting 2015-09-30 13:26:48,388 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: HRegionServer Aborted
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:68) at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2651) 2015-09-30 13:26:48,390 INFO [Thread-6] regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@31dadd46 2015-09-30 13:26:48,390 INFO [Thread-6] regionserver.ShutdownHook: Starting fs shutdown hook thread. 2015-09-30 13:26:48,391 INFO [Thread-6] regionserver.ShutdownHook: Shutdown hook finished.

I am keeping an eye on the region servers via jmx and they didn't seem to have any memory pressure.

sqlline exceptions:

15/09/30 12:38:56 ERROR zookeeper.ZooKeeperWatcher: hconnection-0x358c99f5-0x501df0e3cf000f, quorum=nn.lan:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/meta-region-server at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:360)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:745)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionState(MetaTableLocator.java:482) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.getMetaRegionLocation(MetaTableLocator.java:168) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:600) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:580) at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:559) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1185) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1152) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) at org.apache.hadoop.hbase.client.StatsTrackingRpcRetryingCaller.callWithoutRetries(StatsTrackingRpcRetryingCaller.java:56) at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.loadCache(ClientSmallReversedScanner.java:211) at org.apache.hadoop.hbase.client.ClientSmallReversedScanner.next(ClientSmallReversedScanner.java:185) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1249) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1155) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:153) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) at org.apache.hadoop.hbase.client.StatsTrackingRpcRetryingCaller.callWithoutRetries(StatsTrackingRpcRetryingCaller.java:56) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160) at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155)
    at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:809)
at org.apache.phoenix.iterate.TableResultIterator.getDelegate(TableResultIterator.java:67) at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:88) at org.apache.phoenix.iterate.TableResultIterator.<init>(TableResultIterator.java:79) at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:105) at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:100)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Reply via email to