I've found that we still not configured this: hbase.region.server.rpc.scheduler.factory.class = org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory
Can this misconfiguration leads to our problems? > On 15 Sep 2018, at 02:04, Sergey Soldatov <sergey.solda...@gmail.com> wrote: > > That was the real problem quite a long time ago (couple years?). Can't say > for sure in which version that was fixed, but now indexes has a priority over > regular tables and their regions open first. So by the moment when we replay > WALs for tables, all index regions are supposed to be online. If you see the > problem on recent versions that usually means that cluster is not healthy and > some of the index regions stuck in RiT state. > > Thanks, > Sergey > > On Thu, Sep 13, 2018 at 8:12 PM Jonathan Leech <jonat...@gmail.com > <mailto:jonat...@gmail.com>> wrote: > This seems similar to a failure scenario I’ve seen a couple times. I believe > after multiple restarts you got lucky and tables were brought up by Hbase in > the correct order. > > What happens is some kind of semi-catastrophic failure where 1 or more region > servers go down with edits that weren’t flushed, and are only in the WAL. > These edits belong to regions whose tables have secondary indexes. Hbase > wants to replay the WAL before bringing up the region server. Phoenix wants > to talk to the index region during this, but can’t. It fails enough times > then stops. > > The more region servers / tables / indexes affected, the more likely that a > full restart will get stuck in a classic deadlock. A good old-fashioned data > center outage is a great way to get started with this kind of problem. You > might make some progress and get stuck again, or restart number N might get > those index regions initialized before the main table. > > The sure fire way to recover a cluster in this condition is to strategically > disable all the tables that are failing to come up. You can do this from the > Hbase shell as long as the master is running. If I remember right, it’s a > pain since the disable command will hang. You might need to disable a table, > kill the shell, disable the next table, etc. Then restart. You’ll eventually > have a cluster with all the region servers finally started, and a bunch of > disabled regions. If you disabled index tables, enable one, wait for it to > become available; eg its WAL edits will be replayed, then enable the > associated main table and wait for it to come online. If Hbase did it’s job > without error, and your failure didn’t include losing 4 disks at once, order > will be restored. Lather, rinse, repeat until everything is enabled and > online. > > <TLDR> A big enough failure sprinkled with a little bit of bad luck and what > seems to be a Phoenix flaw == deadlock trying to get HBASE to start up. Fix > by forcing the order that Hbase brings regions online. Finally, never go full > restart. </TLDR> > > > On Sep 10, 2018, at 7:30 PM, Batyrshin Alexander <0x62...@gmail.com > > <mailto:0x62...@gmail.com>> wrote: > > > > After update web interface at Master show that every region server now > > 1.4.7 and no RITS. > > > > Cluster recovered only when we restart all regions servers 4 times... > > > >> On 11 Sep 2018, at 04:08, Josh Elser <els...@apache.org > >> <mailto:els...@apache.org>> wrote: > >> > >> Did you update the HBase jars on all RegionServers? > >> > >> Make sure that you have all of the Regions assigned (no RITs). There could > >> be a pretty simple explanation as to why the index can't be written to. > >> > >>> On 9/9/18 3:46 PM, Batyrshin Alexander wrote: > >>> Correct me if im wrong. > >>> But looks like if you have A and B region server that has index and > >>> primary table then possible situation like this. > >>> A and B under writes on table with indexes > >>> A - crash > >>> B failed on index update because A is not operating then B starting > >>> aborting > >>> A after restart try to rebuild index from WAL but B at this time is > >>> aborting then A starting aborting too > >>> From this moment nothing happens (0 requests to region servers) and A and > >>> B is not responsible from Master-status web interface > >>>> On 9 Sep 2018, at 04:38, Batyrshin Alexander <0x62...@gmail.com > >>>> <mailto:0x62...@gmail.com> <mailto:0x62...@gmail.com > >>>> <mailto:0x62...@gmail.com>>> wrote: > >>>> > >>>> After update we still can't recover HBase cluster. Our region servers > >>>> ABORTING over and over: > >>>> > >>>> prod003: > >>>> Sep 09 02:51:27 prod003 hbase[1440]: 2018-09-09 02:51:27,395 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=92,queue=2,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod003,60020,1536446665703: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> Sep 09 02:51:27 prod003 hbase[1440]: 2018-09-09 02:51:27,395 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=77,queue=7,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod003,60020,1536446665703: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> Sep 09 02:52:19 prod003 hbase[1440]: 2018-09-09 02:52:19,224 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=82,queue=2,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod003,60020,1536446665703: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> Sep 09 02:52:28 prod003 hbase[1440]: 2018-09-09 02:52:28,922 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=94,queue=4,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod003,60020,1536446665703: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> Sep 09 02:55:02 prod003 hbase[957]: 2018-09-09 02:55:02,096 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=95,queue=5,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod003,60020,1536450772841: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> Sep 09 02:55:18 prod003 hbase[957]: 2018-09-09 02:55:18,793 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=97,queue=7,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod003,60020,1536450772841: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> > >>>> prod004: > >>>> Sep 09 02:52:13 prod004 hbase[4890]: 2018-09-09 02:52:13,541 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=83,queue=3,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod004,60020,1536446387325: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> Sep 09 02:52:50 prod004 hbase[4890]: 2018-09-09 02:52:50,264 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=75,queue=5,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod004,60020,1536446387325: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> Sep 09 02:53:40 prod004 hbase[4890]: 2018-09-09 02:53:40,709 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=66,queue=6,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod004,60020,1536446387325: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> Sep 09 02:54:00 prod004 hbase[4890]: 2018-09-09 02:54:00,060 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=89,queue=9,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod004,60020,1536446387325: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> > >>>> prod005: > >>>> Sep 09 02:52:50 prod005 hbase[3772]: 2018-09-09 02:52:50,661 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=65,queue=5,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod005,60020,1536446400009: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> Sep 09 02:53:27 prod005 hbase[3772]: 2018-09-09 02:53:27,542 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=90,queue=0,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod005,60020,1536446400009: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> Sep 09 02:54:00 prod005 hbase[3772]: 2018-09-09 02:53:59,915 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=7,queue=7,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod005,60020,1536446400009: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> Sep 09 02:54:30 prod005 hbase[3772]: 2018-09-09 02:54:30,058 FATAL > >>>> [RpcServer.default.FPBQ.Fifo.handler=16,queue=6,port=60020] > >>>> regionserver.HRegionServer: ABORTING region server > >>>> prod005,60020,1536446400009: Could not update the index table, killing > >>>> server region because couldn't write to an index table > >>>> > >>>> And so on... > >>>> > >>>> Trace is the same everywhere: > >>>> > >>>> Sep 09 02:54:30 prod005 hbase[3772]: > >>>> org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: > >>>> disableIndexOnFailure=true, Failed to write to multiple index tables: > >>>> [KM_IDX1, KM_IDX2, KM_HISTORY_IDX1, KM_HISTORY_IDX2, KM_HISTORY_IDX3] > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:235) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:620) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:595) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:578) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3646) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3108) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3050) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.commitBatch(UngroupedAggregateRegionObserver.java:271) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.commitBatchWithRetries(UngroupedAggregateRegionObserver.java:241) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.rebuildIndices(UngroupedAggregateRegionObserver.java:1068) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:386) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:239) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:287) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2843) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3080) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36613) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > >>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277) > >>>> > >>>>> On 9 Sep 2018, at 01:44, Batyrshin Alexander <0x62...@gmail.com > >>>>> <mailto:0x62...@gmail.com> <mailto:0x62...@gmail.com > >>>>> <mailto:0x62...@gmail.com>>> wrote: > >>>>> > >>>>> Thank you. > >>>>> We're updating our cluster right now... > >>>>> > >>>>> > >>>>>> On 9 Sep 2018, at 01:39, Ted Yu <yuzhih...@gmail.com > >>>>>> <mailto:yuzhih...@gmail.com> <mailto:yuzhih...@gmail.com > >>>>>> <mailto:yuzhih...@gmail.com>>> wrote: > >>>>>> > >>>>>> It seems you should deploy hbase with the following fix: > >>>>>> > >>>>>> HBASE-21069 NPE in StoreScanner.updateReaders causes RS to crash > >>>>>> > >>>>>> 1.4.7 was recently released. > >>>>>> > >>>>>> FYI > >>>>>> > >>>>>> On Sat, Sep 8, 2018 at 3:32 PM Batyrshin Alexander <0x62...@gmail.com > >>>>>> <mailto:0x62...@gmail.com> <mailto:0x62...@gmail.com > >>>>>> <mailto:0x62...@gmail.com>>> wrote: > >>>>>> > >>>>>> Hello, > >>>>>> > >>>>>> We got this exception from *prod006* server > >>>>>> > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: 2018-09-09 00:38:02,532 > >>>>>> FATAL [MemStoreFlusher.1] regionserver.HRegionServer: ABORTING > >>>>>> region server prod006,60020,1536235102833: Replay of > >>>>>> WAL required. Forcing server shutdown > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: > >>>>>> org.apache.hadoop.hbase.DroppedSnapshotException: > >>>>>> region: > >>>>>> KM,c\xEF\xBF\xBD\x16I7\xEF\xBF\xBD\x0A"A\xEF\xBF\xBDd\xEF\xBF\xBD\xEF\xBF\xBD\x19\x07t,1536178245576.60c121ba50e67f2429b9ca2ba2a11bad. > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2645) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2322) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2284) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2170) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2095) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:508) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:478) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> java.lang.Thread.run(Thread.java:748) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: Caused by: > >>>>>> java.lang.NullPointerException > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> java.util.ArrayList.<init>(ArrayList.java:178) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:863) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1172) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1145) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HStore.access$900(HStore.java:122) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2505) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2600) > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: ... 9 more > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: 2018-09-09 00:38:02,532 > >>>>>> FATAL [MemStoreFlusher.1] regionserver.HRegionServer: > >>>>>> RegionServer abort: loaded coprocessors > >>>>>> are: > >>>>>> [org.apache.hadoop.hbase.regionserver.IndexHalfStoreFileReaderGenerator, > >>>>>> org.apache.phoenix.coprocessor.SequenceRegionObserver, > >>>>>> org.apache.phoenix.c > >>>>>> > >>>>>> After that we got ABORTING on almost every Region Servers in > >>>>>> cluster with different reasons: > >>>>>> > >>>>>> *prod003* > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: 2018-09-09 01:12:11,799 > >>>>>> FATAL [PostOpenDeployTasks:88bfac1dfd807c4cd1e9c1f31b4f053f] > >>>>>> regionserver.HRegionServer: ABORTING region > >>>>>> server prod003,60020,1536444066291: Exception running > >>>>>> postOpenDeployTasks; region=88bfac1dfd807c4cd1e9c1f31b4f053f > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: > >>>>>> java.io <http://java.io/>.InterruptedIOException: #139, interrupted. > >>>>>> currentNumberOfTask=8 > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:1853) > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:1823) > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1899) > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:250) > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:213) > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1484) > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> org.apache.hadoop.hbase.client.HTable.put(HTable.java:1031) > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1033) > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1023) > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.MetaTableAccessor.updateLocation(MetaTableAccessor.java:1433) > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.MetaTableAccessor.updateRegionLocation(MetaTableAccessor.java:1400) > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:2041) > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:329) > >>>>>> > >>>>>> *prod002* > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: 2018-09-09 01:12:30,144 > >>>>>> FATAL > >>>>>> [RpcServer.default.FPBQ.Fifo.handler=36,queue=6,port=60020] > >>>>>> regionserver.HRegionServer: ABORTING region > >>>>>> server prod002,60020,1536235138673: Could not update the index > >>>>>> table, killing server region because couldn't write to an index > >>>>>> table > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: > >>>>>> > >>>>>> org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: > >>>>>> disableIndexOnFailure=true, Failed to write to multiple index > >>>>>> tables: [KM_IDX1, KM_IDX2, KM_HISTORY1, KM_HISTORY2, > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:235) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:620) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:595) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:578) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3646) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3108) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3050) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.commitBatch(UngroupedAggregateRegionObserver.java:271) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.access$000(UngroupedAggregateRegionObserver.java:164) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver$1.doMutation(UngroupedAggregateRegionObserver.java:246) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.index.PhoenixIndexFailurePolicy.doBatchWithRetries(PhoenixIndexFailurePolicy.java:455) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.handleIndexWriteException(UngroupedAggregateRegionObserver.java:929) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.commitBatchWithRetries(UngroupedAggregateRegionObserver.java:243) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.rebuildIndices(UngroupedAggregateRegionObserver.java:1077) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:386) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:239) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:287) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2843) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3080) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36613) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > >>>>>> > >>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277) > >>>>>> > >>>>>> > >>>>>> And etc... > >>>>>> > >>>>>> Master-status web interface shows that contact lost from this > >>>>>> aborted servers. > >>>>> > >>>> > >