By no means am I judging Phoenix based on this. This is simply a design trade-off (scylladb goes the same route and builds global indexes). I appreciate all the effort that has gone in to Phoenix, and it was indeed a life saver. But the technical point remains that single node failures have potential to cascade to the entire cluster. That's the nature of global indexes, not specific to phoenix.
I apologize if my response came off as dismissing phoenix altogether. FWIW, I'm a big advocate of phoenix at my org internally, albeit for the newer version. On Fri, Nov 2, 2018, 4:09 PM Josh Elser <els...@apache.org> wrote: > I would strongly disagree with the assertion that this is some > unavoidable problem. Yes, an inverted index is a data structure which, > by design, creates a hotspot (phrased another way, this is "data > locality"). > > Lots of extremely smart individuals have spent a significant amount of > time and effort in stabilizing secondary indexes in the past 1-2 years, > not to mention others spending time on a local index implementation. > Judging Phoenix in its entirety based off of an arbitrarily old version > of Phoenix is disingenuous. > > On 11/2/18 2:00 PM, Neelesh wrote: > > I think this is an unavoidable problem in some sense, if global indexes > > are used. Essentially global indexes create a graph of dependent region > > servers due to index rpc calls from one RS to another. Any single > > failure is bound to affect the entire graph, which under reasonable load > > becomes the entire HBase cluster. We had to drop global indexes just to > > keep the cluster running for more than a few days. > > > > I think Cassandra has local secondary indexes preciesly because of this > > issue. Last I checked there were significant pending improvements > > required for Phoenix local indexes, especially around read paths ( not > > utilizing primary key prefixes in secondary index reads where possible, > > for example) > > > > > > On Thu, Sep 13, 2018, 8:12 PM Jonathan Leech <jonat...@gmail.com > > <mailto:jonat...@gmail.com>> wrote: > > > > This seems similar to a failure scenario I’ve seen a couple times. I > > believe after multiple restarts you got lucky and tables were > > brought up by Hbase in the correct order. > > > > What happens is some kind of semi-catastrophic failure where 1 or > > more region servers go down with edits that weren’t flushed, and are > > only in the WAL. These edits belong to regions whose tables have > > secondary indexes. Hbase wants to replay the WAL before bringing up > > the region server. Phoenix wants to talk to the index region during > > this, but can’t. It fails enough times then stops. > > > > The more region servers / tables / indexes affected, the more likely > > that a full restart will get stuck in a classic deadlock. A good > > old-fashioned data center outage is a great way to get started with > > this kind of problem. You might make some progress and get stuck > > again, or restart number N might get those index regions initialized > > before the main table. > > > > The sure fire way to recover a cluster in this condition is to > > strategically disable all the tables that are failing to come up. > > You can do this from the Hbase shell as long as the master is > > running. If I remember right, it’s a pain since the disable command > > will hang. You might need to disable a table, kill the shell, > > disable the next table, etc. Then restart. You’ll eventually have a > > cluster with all the region servers finally started, and a bunch of > > disabled regions. If you disabled index tables, enable one, wait for > > it to become available; eg its WAL edits will be replayed, then > > enable the associated main table and wait for it to come online. If > > Hbase did it’s job without error, and your failure didn’t include > > losing 4 disks at once, order will be restored. Lather, rinse, > > repeat until everything is enabled and online. > > > > <TLDR> A big enough failure sprinkled with a little bit of bad luck > > and what seems to be a Phoenix flaw == deadlock trying to get HBASE > > to start up. Fix by forcing the order that Hbase brings regions > > online. Finally, never go full restart. </TLDR> > > > > > On Sep 10, 2018, at 7:30 PM, Batyrshin Alexander > > <0x62...@gmail.com <mailto:0x62...@gmail.com>> wrote: > > > > > > After update web interface at Master show that every region > > server now 1.4.7 and no RITS. > > > > > > Cluster recovered only when we restart all regions servers 4 > times... > > > > > >> On 11 Sep 2018, at 04:08, Josh Elser <els...@apache.org > > <mailto:els...@apache.org>> wrote: > > >> > > >> Did you update the HBase jars on all RegionServers? > > >> > > >> Make sure that you have all of the Regions assigned (no RITs). > > There could be a pretty simple explanation as to why the index can't > > be written to. > > >> > > >>> On 9/9/18 3:46 PM, Batyrshin Alexander wrote: > > >>> Correct me if im wrong. > > >>> But looks like if you have A and B region server that has index > > and primary table then possible situation like this. > > >>> A and B under writes on table with indexes > > >>> A - crash > > >>> B failed on index update because A is not operating then B > > starting aborting > > >>> A after restart try to rebuild index from WAL but B at this > > time is aborting then A starting aborting too > > >>> From this moment nothing happens (0 requests to region servers) > > and A and B is not responsible from Master-status web interface > > >>>> On 9 Sep 2018, at 04:38, Batyrshin Alexander > > <0x62...@gmail.com <mailto:0x62...@gmail.com> > > <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote: > > >>>> > > >>>> After update we still can't recover HBase cluster. Our region > > servers ABORTING over and over: > > >>>> > > >>>> prod003: > > >>>> Sep 09 02:51:27 prod003 hbase[1440]: 2018-09-09 02:51:27,395 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=92,queue=2,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod003,60020,1536446665703: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> Sep 09 02:51:27 prod003 hbase[1440]: 2018-09-09 02:51:27,395 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=77,queue=7,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod003,60020,1536446665703: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> Sep 09 02:52:19 prod003 hbase[1440]: 2018-09-09 02:52:19,224 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=82,queue=2,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod003,60020,1536446665703: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> Sep 09 02:52:28 prod003 hbase[1440]: 2018-09-09 02:52:28,922 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=94,queue=4,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod003,60020,1536446665703: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> Sep 09 02:55:02 prod003 hbase[957]: 2018-09-09 02:55:02,096 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=95,queue=5,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod003,60020,1536450772841: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> Sep 09 02:55:18 prod003 hbase[957]: 2018-09-09 02:55:18,793 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=97,queue=7,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod003,60020,1536450772841: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> > > >>>> prod004: > > >>>> Sep 09 02:52:13 prod004 hbase[4890]: 2018-09-09 02:52:13,541 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=83,queue=3,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod004,60020,1536446387325: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> Sep 09 02:52:50 prod004 hbase[4890]: 2018-09-09 02:52:50,264 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=75,queue=5,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod004,60020,1536446387325: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> Sep 09 02:53:40 prod004 hbase[4890]: 2018-09-09 02:53:40,709 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=66,queue=6,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod004,60020,1536446387325: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> Sep 09 02:54:00 prod004 hbase[4890]: 2018-09-09 02:54:00,060 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=89,queue=9,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod004,60020,1536446387325: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> > > >>>> prod005: > > >>>> Sep 09 02:52:50 prod005 hbase[3772]: 2018-09-09 02:52:50,661 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=65,queue=5,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod005,60020,1536446400009: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> Sep 09 02:53:27 prod005 hbase[3772]: 2018-09-09 02:53:27,542 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=90,queue=0,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod005,60020,1536446400009: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> Sep 09 02:54:00 prod005 hbase[3772]: 2018-09-09 02:53:59,915 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=7,queue=7,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod005,60020,1536446400009: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: 2018-09-09 02:54:30,058 > > FATAL [RpcServer.default.FPBQ.Fifo.handler=16,queue=6,port=60020] > > regionserver.HRegionServer: ABORTING region server > > prod005,60020,1536446400009: Could not update the index table, > > killing server region because couldn't write to an index table > > >>>> > > >>>> And so on... > > >>>> > > >>>> Trace is the same everywhere: > > >>>> > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: > > > org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: > > disableIndexOnFailure=true, Failed to write to multiple index > > tables: [KM_IDX1, KM_IDX2, KM_HISTORY_IDX1, KM_HISTORY_IDX2, > > KM_HISTORY_IDX3] > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:235) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:620) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:595) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:578) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3646) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3108) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3050) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.commitBatch(UngroupedAggregateRegionObserver.java:271) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.commitBatchWithRetries(UngroupedAggregateRegionObserver.java:241) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.rebuildIndices(UngroupedAggregateRegionObserver.java:1068) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:386) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:239) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:287) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2843) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3080) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36613) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) > > >>>> Sep 09 02:54:30 prod005 hbase[3772]: at > > > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277) > > >>>> > > >>>>> On 9 Sep 2018, at 01:44, Batyrshin Alexander > > <0x62...@gmail.com <mailto:0x62...@gmail.com> > > <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote: > > >>>>> > > >>>>> Thank you. > > >>>>> We're updating our cluster right now... > > >>>>> > > >>>>> > > >>>>>> On 9 Sep 2018, at 01:39, Ted Yu <yuzhih...@gmail.com > > <mailto:yuzhih...@gmail.com> <mailto:yuzhih...@gmail.com > > <mailto:yuzhih...@gmail.com>>> wrote: > > >>>>>> > > >>>>>> It seems you should deploy hbase with the following fix: > > >>>>>> > > >>>>>> HBASE-21069 NPE in StoreScanner.updateReaders causes RS to > crash > > >>>>>> > > >>>>>> 1.4.7 was recently released. > > >>>>>> > > >>>>>> FYI > > >>>>>> > > >>>>>> On Sat, Sep 8, 2018 at 3:32 PM Batyrshin Alexander > > <0x62...@gmail.com <mailto:0x62...@gmail.com> > > <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote: > > >>>>>> > > >>>>>> Hello, > > >>>>>> > > >>>>>> We got this exception from *prod006* server > > >>>>>> > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: 2018-09-09 > 00:38:02,532 > > >>>>>> FATAL [MemStoreFlusher.1] regionserver.HRegionServer: > ABORTING > > >>>>>> region server prod006,60020,1536235102833: Replay of > > >>>>>> WAL required. Forcing server shutdown > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: > > >>>>>> org.apache.hadoop.hbase.DroppedSnapshotException: > > >>>>>> region: > > > > KM,c\xEF\xBF\xBD\x16I7\xEF\xBF\xBD\x0A"A\xEF\xBF\xBDd\xEF\xBF\xBD\xEF\xBF\xBD\x19\x07t,1536178245576.60c121ba50e67f2429b9ca2ba2a11bad. > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2645) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2322) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2284) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2170) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:2095) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:508) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:478) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:76) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:264) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> java.lang.Thread.run(Thread.java:748) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: Caused by: > > >>>>>> java.lang.NullPointerException > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> java.util.ArrayList.<init>(ArrayList.java:178) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.StoreScanner.updateReaders(StoreScanner.java:863) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.HStore.notifyChangedReadersObservers(HStore.java:1172) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.HStore.updateStorefiles(HStore.java:1145) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > org.apache.hadoop.hbase.regionserver.HStore.access$900(HStore.java:122) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.commit(HStore.java:2505) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2600) > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: ... 9 more > > >>>>>> Sep 09 00:38:02 prod006 hbase[18907]: 2018-09-09 > 00:38:02,532 > > >>>>>> FATAL [MemStoreFlusher.1] regionserver.HRegionServer: > > >>>>>> RegionServer abort: loaded coprocessors > > >>>>>> are: > > > [org.apache.hadoop.hbase.regionserver.IndexHalfStoreFileReaderGenerator, > > >>>>>> org.apache.phoenix.coprocessor.SequenceRegionObserver, > > >>>>>> org.apache.phoenix.c > > >>>>>> > > >>>>>> After that we got ABORTING on almost every Region Servers > in > > >>>>>> cluster with different reasons: > > >>>>>> > > >>>>>> *prod003* > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: 2018-09-09 > 01:12:11,799 > > >>>>>> FATAL > [PostOpenDeployTasks:88bfac1dfd807c4cd1e9c1f31b4f053f] > > >>>>>> regionserver.HRegionServer: ABORTING region > > >>>>>> server prod003,60020,1536444066291: Exception running > > >>>>>> postOpenDeployTasks; > region=88bfac1dfd807c4cd1e9c1f31b4f053f > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: > > >>>>>> java.io <http://java.io>.InterruptedIOException: #139, > > interrupted. > > >>>>>> currentNumberOfTask=8 > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> > > > > org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:1853) > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> > > > > org.apache.hadoop.hbase.client.AsyncProcess.waitForMaximumCurrentTasks(AsyncProcess.java:1823) > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> > > > > org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1899) > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> > > > > org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:250) > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> > > > > org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:213) > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> > > > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1484) > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> org.apache.hadoop.hbase.client.HTable.put(HTable.java:1031) > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> > > > org.apache.hadoop.hbase.MetaTableAccessor.put(MetaTableAccessor.java:1033) > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> > > > > org.apache.hadoop.hbase.MetaTableAccessor.putToMetaTable(MetaTableAccessor.java:1023) > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> > > > > org.apache.hadoop.hbase.MetaTableAccessor.updateLocation(MetaTableAccessor.java:1433) > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> > > > > org.apache.hadoop.hbase.MetaTableAccessor.updateRegionLocation(MetaTableAccessor.java:1400) > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.postOpenDeployTasks(HRegionServer.java:2041) > > >>>>>> Sep 09 01:12:11 prod003 hbase[11552]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler$PostOpenDeployTasksThread.run(OpenRegionHandler.java:329) > > >>>>>> > > >>>>>> *prod002* > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: 2018-09-09 > 01:12:30,144 > > >>>>>> FATAL > > >>>>>> [RpcServer.default.FPBQ.Fifo.handler=36,queue=6,port=60020] > > >>>>>> regionserver.HRegionServer: ABORTING region > > >>>>>> server prod002,60020,1536235138673: Could not update the > index > > >>>>>> table, killing server region because couldn't write to an > > index > > >>>>>> table > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: > > >>>>>> > > > org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: > > >>>>>> disableIndexOnFailure=true, Failed to write to multiple > index > > >>>>>> tables: [KM_IDX1, KM_IDX2, KM_HISTORY1, KM_HISTORY2, > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:235) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:620) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:595) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:578) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3646) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3108) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3050) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.commitBatch(UngroupedAggregateRegionObserver.java:271) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.access$000(UngroupedAggregateRegionObserver.java:164) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver$1.doMutation(UngroupedAggregateRegionObserver.java:246) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.index.PhoenixIndexFailurePolicy.doBatchWithRetries(PhoenixIndexFailurePolicy.java:455) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.handleIndexWriteException(UngroupedAggregateRegionObserver.java:929) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.commitBatchWithRetries(UngroupedAggregateRegionObserver.java:243) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.rebuildIndices(UngroupedAggregateRegionObserver.java:1077) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver.doPostScannerOpen(UngroupedAggregateRegionObserver.java:386) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:239) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:287) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2843) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3080) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36613) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) > > >>>>>> Sep 09 01:12:30 prod002 hbase[29056]: at > > >>>>>> > > > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277) > > >>>>>> > > >>>>>> > > >>>>>> And etc... > > >>>>>> > > >>>>>> Master-status web interface shows that contact lost from > this > > >>>>>> aborted servers. > > >>>>> > > >>>> > > > > > >