Hi Guys, The issue is a deadlock but it's not related with phoenix and it can be resolved increasing the number of threads responsible for opening the regions.
<property> > <name>hbase.regionserver.executor.openregion.threads</name> > <value>100</value> > </property> Got help from here <https://community.hortonworks.com/questions/8757/phoenix-local-indexes.html> . Thanks Cheers Pedro On Tue, Jan 5, 2016 at 10:18 PM, Pedro Gandola <pedro.gand...@gmail.com> wrote: > Hi Guys, > > I have been testing out the Phoenix Local Indexes and I'm facing an issue > after restart the entire HBase cluster. > > *Scenario:* I'm using Phoenix 4.4 and HBase 1.1.1. My test cluster > contains 10 machines and the main table contains 300 pre-split regions > which implies 300 regions on local index table as well and to configure > Phoenix I followed thistutorial > <http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_installing_manually_book/content/configuring-hbase-for-phoenix.html> > . > > When I start a fresh cluster everything is just fine, the local index is > created and I can insert data and query it using the proper indexes. The > problem comes when I perform a full restart of the cluster to update some > configurations in that moment I'm not able to restart the cluster anymore. > I should do a proper rolling restart but it looks that Ambari is not doing > it in some situations. > > Most of the servers are throwing exceptions like: > > INFO [htable-pool7-t1] client.AsyncProcess: #5, >> table=_LOCAL_IDX_BIDDING_EVENTS, attempt=27/350 failed=1ops, last >> exception: org.apache.hadoop.hbase.NotServingRegionException: >> org.apache.hadoop.hbase.NotServingRegionException: Region >> _LOCAL_IDX_BIDDING_EVENTS,57e4b17e4b17e4ac,1451943466164.253bdee3695b566545329fa3ac86d05e. >> is not online on ip-10-5-4-24.ec2.internal,16020,1451996088952 >> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2898) >> at >> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:947) >> at >> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1991) >> at >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213) >> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) >> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) >> at >> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) >> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) >> at java.lang.Thread.run(Thread.java:745) >> on ip-10-5-4-24.ec2.internal,16020,1451942002174, tracking started null, >> retrying after=20001ms, replay=1ops >> INFO >> [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t1] >> client.AsyncProcess: #3, waiting for 2 actions to finish >> INFO >> [ip-10-5-4-26.ec2.internal,16020,1451996087089-recovery-writer--pool5-t2] >> client.AsyncProcess: #4, waiting for 2 actions to finish > > > It looks that they are getting into a state where some region servers are > waiting for other regions that are not available yet in other servers. > > On HBase UI I can see servers stuck on this messages: > > *Description:* Replaying edits from >> hdfs://.../recovered.edits/0000000000000464197 >> *Status:* Running pre-WAL-restore hook in coprocessors (since 48mins, >> 45sec ago) > > > Another interesting thing that I noticed is the *empty coprocessor list* for > the servers that are stuck with 0 regions assigned. > > HBase master goes down after logging some of these messages: > > GeneralBulkAssigner: Failed bulking assigning N regions > > > I was able to perform full restarts before start using local indexes and > everything worked fine. This can probably be a misconfiguration from my > side but I have checked different properties and approaches to restart the > cluster and I'm unable to do it. > > My understanding about local indexes on phoenix (please correct me if I'm > wrong) is that they are normal HBase tables and phoenix places the regions > to ensure the proper data locality. Is the data locality fully maintained > when we lose N region servers and/or the regions are moved? > > Any insights would be very helpful. > > Thank you > Cheers > Pedro >