I'm not exactly sure what happened during the night, but the problem somehow resolved itself after a few hours. The regionservers kept spewing NotServingRegionExceptions for about 2h and then all of a sudden a few of the following exceptions came by, after which the problems dissapeared.

2015-10-12 18:16:56,616 ERROR [RS_OPEN_REGION-cn1:16020-2] handler.OpenRegionHandler: Failed open of region=QUERYLOGS,\x 01,1440751079220.87e5c32c06d42524fd0876eb72b1472e., starting to roll back the global memstore size. org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: Failed to write to multiple index tables at org.apache.phoenix.hbase.index.write.recovery.TrackingParallelWriterIndexCommitter.write(TrackingParallelWrit
erIndexCommitter.java:222)
at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:179) at org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:169) at org.apache.phoenix.hbase.index.Indexer.preWALRestore(Indexer.java:545) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$58.call(RegionCoprocessorHost.java:1432) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:16
73)
at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1748) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1705) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preWALRestore(RegionCoprocessorHost.java:1423) at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4013) at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:3869) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionStores(HRegion.java:937) at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:807) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:782) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6227) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6188) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6159) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6115) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6066) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:362) at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:129) at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

I still don't know if it's Phoenix or HBase related, but for now I'm glad it all seems resolved. If somebody has any clue what happened or if something should be done to prevent a situation like this, please let me know.

Best regards,
Jurian


On 10/12/2015 05:26 PM, Ted Yu wrote:
Have you checked master to see if region assignment went okay ?

Cheers

On Oct 12, 2015, at 7:56 AM, Jurian Broertjes <[email protected]> 
wrote:

Hi all,

I'm using hbase (1.1.2) with phoenix (4.5.2-HBase-1.1) and had some (minor) 
HDFS issues. The HDFS issues are resolved and when I try to bring HBase back 
up, I run into issues where some regions won't come online.

Some RS log:
2015-10-12 14:08:54,681 INFO 
[cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t1] client.AsyncProcess: 
#8, waiting for 1  actions to finish
2015-10-12 14:08:55,781 INFO 
[cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t2] client.AsyncProcess: 
#13, waiting for 1  actions to finish
2015-10-12 14:08:55,819 INFO 
[cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t3] client.AsyncProcess: 
#15, waiting for 2  actions to finish
2015-10-12 14:08:59,119 INFO 
[cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t4] client.AsyncProcess: 
#24, waiting for 1  actions to finish
2015-10-12 14:08:59,138 INFO 
[cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t5] client.AsyncProcess: 
#25, waiting for 2  actions to finish
2015-10-12 14:09:04,692 INFO 
[cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t1] client.AsyncProcess: 
#8, waiting for 1  actions to finish
2015-10-12 14:09:05,793 INFO 
[cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t2] client.AsyncProcess: 
#13, waiting for 1  actions to finish
2015-10-12 14:09:05,831 INFO 
[cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t3] client.AsyncProcess: 
#15, waiting for 2  actions to finish
2015-10-12 14:09:07,214 INFO 
[regionserver/cn1.xxx.xx/89.188.14.2:16020-shortCompactions-1444658915963] 
client.AsyncProcess: #23, waiting for some tasks to finish. Expected max=0, 
tasksInProgress=9
2015-10-12 14:09:09,131 INFO 
[cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t4] client.AsyncProcess: 
#24, waiting for 1  actions to finish
2015-10-12 14:09:09,150 INFO 
[cn1.xxx.xx,16020,1444658908121-recovery-writer--pool6-t5] client.AsyncProcess: 
#25, waiting for 2  actions to finish
2015-10-12 14:09:12,945 INFO  [htable-pool8-t1] client.AsyncProcess: #8, 
table=OUTLINKS_SSI_INDEX, attempt=10/350 failed=1ops, last exception: 
org.apache.hadoop.hbase.NotServingRegionException: 
org.apache.hadoop.hbase.NotServingRegionException: Region 
OUTLINKS_SSI_INDEX,,1440761791894.c9cfcf16be9852553efe45e36387a4b1. is not 
online on cn1.xxx.xx,16020,1444658908121
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2898)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:947)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1991)
  at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
  at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
  at java.lang.Thread.run(Thread.java:745)
on cn1.xxx.xx,16020,1444654232108, tracking started null, retrying 
after=10086ms, replay=1ops

The cluster consists of 2 masters and 3 region servers and an external 
Zookeeper.

Anyone knows what's going on here?

Thanks in advance,

Jurian

Reply via email to