I've created bug with reproduce steps: https://issues.apache.org/jira/browse/PHOENIX-4960
> On 3 Oct 2018, at 21:06, Batyrshin Alexander <0x62...@gmail.com> wrote: > > But we see that Phoenix commit() in our cases fails with "ERROR 1120 (XCL20): > Writes to table blocked until index can be updated" because of > org.apache.hadoop.hbase.NotServingRegionException. > Expected that there must be retry and success for commit() > >> On 2 Oct 2018, at 22:02, Josh Elser <els...@apache.org >> <mailto:els...@apache.org>> wrote: >> >> HBase will invalidate the location of a Region on seeing certain exceptions >> (including NotServingRegionException). After it sees the exception you have >> copied below, it should re-fetch the location of the Region. >> >> If HBase keeps trying to access a Region on a RS that isn't hosting it, >> either hbase:meta is wrong or the HBase client has a bug. >> >> However, to the point here, if that region was split successfully, clients >> should not be reading from that region anymore -- they would read from the >> daughters of that split region. >> >> On 10/2/18 2:34 PM, Batyrshin Alexander wrote: >>> We tried branch 4.14-HBase-1.4 at commit >>> https://github.com/apache/phoenix/commit/52893c240e4f24e2bfac0834d35205f866c16ed8 >>> >>> <https://github.com/apache/phoenix/commit/52893c240e4f24e2bfac0834d35205f866c16ed8> >>> Is there any way to invalidate meta-cache on event of index regions split? >>> Maybe there is some option to set max time to live for cache? >>> Watching this on regions servers: >>> At 09:34 regions *96c3ede1c40c98959e60bd6fc0e07269* split on prod019 >>> Oct 02 09:34:39 prod019 hbase[152127]: 2018-10-02 09:34:39,719 INFO >>> [regionserver/prod019/10.0.0.19:60020-splits-1538462079117] >>> regionserver.SplitRequest: Region split, hbase:meta updated, and report to >>> master. Parent=IDX_MARK_O,\x0B\x0000000046200020qC8kovh\x00\x01\x80\x00\x >>> 01e\x89\x8B\x99@\x00\x00\x00\x00,1537400033958.*96c3ede1c40c98959e60bd6fc0e07269*., >>> new regions: >>> IDX_MARK_O,\x0B\x0000000046200020qC8kovh\x00\x01\x80\x00\x01e\x89\x8B\x99@\x00\x00\x00\x00,1538462079161.80fc2516619d8665789b0c5a2bca8a8b., >>> IDX_MARK_O,\x0BON_SCHFDOPPR_2AL-5602 >>> 2B7D-2F90-4AA5-8125-4F4001B5BE0D-00000_2AL-C0D76C01-EE7E-496B-BCD6-F6488956F75A-00000_20180228_7E372181-F23D-4EBE-9CAD-5F5218C9798I\x0000000046186195_5.UHQ=\x00\x02\x80\x00\x01a\xD3\xEA@\x80\x00\x00\x00\x00,1538462079161.24b6675d9e51067a21e58f294a9f816b.. >>> Split took 0sec >>> Fail at 11:51 prod018 >>> Oct 02 11:51:13 prod018 hbase[108476]: 2018-10-02 11:51:13,752 WARN >>> [hconnection-0x4131af19-shared--pool24-t26652] client.AsyncProcess: #164, >>> table=IDX_MARK_O, attempt=1/1 failed=1ops, last exception: >>> org.apache.hadoop.hbase.NotServingRegionException: >>> org.apache.hadoop.hbase.NotServingRegionException: Region >>> IDX_MARK_O,\x0B\x0000000046200020qC8kovh\x00\x01\x80\x00\x01e\x89\x8B\x99@\x00\x00\x00\x00,1537400033958.*96c3ede1c40c98959e60bd6fc0e07269*. >>> is not online on prod019,60020,1538417663874 >>> Fail at 13:38 on prod005 >>> Oct 02 13:38:06 prod005 hbase[197079]: 2018-10-02 13:38:06,040 WARN >>> [hconnection-0x5e744e65-shared--pool8-t31214] client.AsyncProcess: #53, >>> table=IDX_MARK_O, attempt=1/1 failed=11ops, last exception: >>> org.apache.hadoop.hbase.NotServingRegionException: >>> org.apache.hadoop.hbase.NotServingRegionException: Region >>> IDX_MARK_O,\x0B\x0000000046200020qC8kovh\x00\x01\x80\x00\x01e\x89\x8B\x99@\x00\x00\x00\x00,1537400033958.*96c3ede1c40c98959e60bd6fc0e07269*. >>> is not online on prod019,60020,1538417663874 >>>> On 27 Sep 2018, at 01:04, Ankit Singhal <ankitsingha...@gmail.com >>>> <mailto:ankitsingha...@gmail.com> <mailto:ankitsingha...@gmail.com >>>> <mailto:ankitsingha...@gmail.com>>> wrote: >>>> >>>> You might be hitting PHOENIX-4785 >>>> <https://jira.apache.org/jira/browse/PHOENIX-4785 >>>> <https://jira.apache.org/jira/browse/PHOENIX-4785>>, you can apply the >>>> patch on top of 4.14 and see if it fixes your problem. >>>> >>>> Regards, >>>> Ankit Singhal >>>> >>>> On Wed, Sep 26, 2018 at 2:33 PM Batyrshin Alexander <0x62...@gmail.com >>>> <mailto:0x62...@gmail.com> <mailto:0x62...@gmail.com >>>> <mailto:0x62...@gmail.com>>> wrote: >>>> >>>> Any advices? Helps? >>>> I can reproduce problem and capture more logs if needed. >>>> >>>>> On 21 Sep 2018, at 02:13, Batyrshin Alexander <0x62...@gmail.com >>>>> <mailto:0x62...@gmail.com> >>>>> <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote: >>>>> >>>>> Looks like lock goes away 30 minutes after index region split. >>>>> So i can assume that this issue comes from cache that configured >>>>> by this option:*phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs* >>>>> >>>>> >>>>> >>>>>> On 21 Sep 2018, at 00:15, Batyrshin Alexander <0x62...@gmail.com >>>>>> <mailto:0x62...@gmail.com> >>>>>> <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote: >>>>>> >>>>>> And how this split looks at Master logs: >>>>>> >>>>>> Sep 20 19:45:04 prod001 hbase[10838]: 2018-09-20 19:45:04,888 >>>>>> INFO [AM.ZK.Worker-pool5-t282] master.RegionStates: Transition >>>>>> {3e44b85ddf407da831dbb9a871496986 state=OPEN, >>>>>> ts=1537304859509, server=prod013,60020,1537304282885} to >>>>>> {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, >>>>>> ts=1537461904888, server=prod >>>>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 >>>>>> INFO [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition >>>>>> {3e44b85ddf407da831dbb9a871496986 >>>>>> state=SPLITTING, ts=1537461905340, >>>>>> server=prod013,60020,1537304282885} to >>>>>> {3e44b85ddf407da831dbb9a871496986 state=SPLIT, ts=1537461905340, >>>>>> server=pro >>>>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 >>>>>> INFO [AM.ZK.Worker-pool5-t284] master.RegionStates: Offlined >>>>>> 3e44b85ddf407da831dbb9a871496986 from prod013,60020,1537304282885 >>>>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 >>>>>> INFO [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition >>>>>> {33cba925c7acb347ac3f5e70e839c3cb >>>>>> state=SPLITTING_NEW, ts=1537461905340, >>>>>> server=prod013,60020,1537304282885} to >>>>>> {33cba925c7acb347ac3f5e70e839c3cb state=OPEN, ts=1537461905341, >>>>>> server= >>>>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 >>>>>> INFO [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition >>>>>> {acb8f16a004a894c8706f6e12cd26144 >>>>>> state=SPLITTING_NEW, ts=1537461905340, >>>>>> server=prod013,60020,1537304282885} to >>>>>> {acb8f16a004a894c8706f6e12cd26144 state=OPEN, ts=1537461905341, >>>>>> server= >>>>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,343 >>>>>> INFO [AM.ZK.Worker-pool5-t284] master.AssignmentManager: >>>>>> Handled SPLIT >>>>>> event; >>>>>> parent=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986., >>>>>> daughter a=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1 >>>>>> Sep 20 19:47:41 prod001 hbase[10838]: 2018-09-20 19:47:41,972 >>>>>> INFO [prod001,60000,1537304851459_ChoreService_2] >>>>>> balancer.StochasticLoadBalancer: Skipping load balancing because >>>>>> balanced cluster; total cost is 17.82282205608522, sum >>>>>> multiplier is 1102.0 min cost which need balance is 0.05 >>>>>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,021 >>>>>> INFO [prod001,60000,1537304851459_ChoreService_1] >>>>>> hbase.MetaTableAccessor: >>>>>> Deleted >>>>>> IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986. >>>>>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,022 >>>>>> INFO [prod001,60000,1537304851459_ChoreService_1] >>>>>> master.CatalogJanitor: Scanned 779 catalog row(s), gc'd 0 >>>>>> unreferenced merged region(s) and 1 unreferenced parent region(s) >>>>>> >>>>>>> On 20 Sep 2018, at 21:43, Batyrshin Alexander >>>>>>> <0x62...@gmail.com <mailto:0x62...@gmail.com> >>>>>>> <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote: >>>>>>> >>>>>>> Looks like problem was because of index region split >>>>>>> >>>>>>> Index region split at prod013: >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 >>>>>>> INFO >>>>>>> [regionserver/prod013/10.0.0.13:60020-splits-1537400010677] >>>>>>> regionserver.SplitRequest: Region split, hbase:meta updated, >>>>>>> and report to master. >>>>>>> >>>>>>> Parent=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986., >>>>>>> new >>>>>>> regions: >>>>>>> IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1537461904877.33cba925c7acb347ac3f5e70e839c3cb., >>>>>>> >>>>>>> IDX_MARK_O,\x107834005168\x0000000046200068=4YF!YI,1537461904877.acb8f16a004a894c8706f6e12cd26144.. >>>>>>> Split took 0sec >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 >>>>>>> INFO >>>>>>> [regionserver/prod013/10.0.0.13:60020-splits-1537400010677] >>>>>>> regionserver.SplitRequest: Split transaction journal: >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: STARTED at >>>>>>> 1537461904853 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: PREPARED at >>>>>>> 1537461904877 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: >>>>>>> BEFORE_PRE_SPLIT_HOOK at 1537461904877 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: >>>>>>> AFTER_PRE_SPLIT_HOOK at 1537461904877 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: SET_SPLITTING at >>>>>>> 1537461904880 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: CREATE_SPLIT_DIR >>>>>>> at 1537461904987 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: >>>>>>> CLOSED_PARENT_REGION at 1537461905002 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: OFFLINED_PARENT >>>>>>> at 1537461905002 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: >>>>>>> STARTED_REGION_A_CREATION at 1537461905056 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: >>>>>>> STARTED_REGION_B_CREATION at 1537461905131 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: PONR at >>>>>>> 1537461905192 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: OPENED_REGION_A >>>>>>> at 1537461905249 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: OPENED_REGION_B >>>>>>> at 1537461905252 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: >>>>>>> BEFORE_POST_SPLIT_HOOK at 1537461905439 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: >>>>>>> AFTER_POST_SPLIT_HOOK at 1537461905439 >>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: COMPLETED at >>>>>>> 1537461905439 >>>>>>> >>>>>>> Index update failed at prod002: >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,520 >>>>>>> WARN [hconnection-0x4f3242a0-shared--pool32-t36014] >>>>>>> client.AsyncProcess: #220, table=IDX_MARK_O, attempt=1/1 >>>>>>> failed=1ops, last exception: >>>>>>> org.apache.hadoop.hbase.NotServingRegionException: >>>>>>> org.apache.hadoop.hbase.NotServingRegionException: Re >>>>>>> gion >>>>>>> >>>>>>> IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986. >>>>>>> is not online on prod013,60020,1537304282885 >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3081) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2365) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: on >>>>>>> prod013,60020,1537304282885, tracking started Thu Sep 20 >>>>>>> 20:09:24 MSK 2018; not retrying 1 - final failure >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,549 >>>>>>> INFO >>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020] >>>>>>> zookeeper.RecoverableZooKeeper: Process >>>>>>> identifier=hconnection-0x39beae45 connecting to ZooKeeper >>>>>>> ensemble=10.0.0.1:2181 <http://10.0.0.1:2181/ >>>>>>> <http://10.0.0.1:2181/>>,10.0.0.2:2181 >>>>>>> <http://10.0.0.2:2181/ <http://10.0.0.2:2181/>>,10.0.0.3:2181 >>>>>>> <http://10.0.0.3:2181/ <http://10.0.0.3:2181/>> >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,549 >>>>>>> INFO >>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020] >>>>>>> zookeeper.ZooKeeper: Initiating client >>>>>>> connection, connectString=10.0.0.1:2181 >>>>>>> <http://10.0.0.1:2181/ <http://10.0.0.1:2181/>>,10.0.0.2:2181 >>>>>>> <http://10.0.0.2:2181/ <http://10.0.0.2:2181/>>,10.0.0.3:2181 >>>>>>> <http://10.0.0.3:2181/ <http://10.0.0.3:2181/>> >>>>>>> sessionTimeout=90000 >>>>>>> watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@3ef61f7 >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,562 >>>>>>> INFO >>>>>>> >>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181 >>>>>>> <http://10.0.0.3:2181/ <http://10.0.0.3:2181/>>)] >>>>>>> zookeeper.ClientCnxn: Opening >>>>>>> socket connection to server 10.0.0.3/10.0.0.3:2181 >>>>>>> <http://10.0.0.3/10.0.0.3:2181 <http://10.0.0.3/10.0.0.3:2181>>. >>>>>>> Will not attempt to >>>>>>> authenticate using SASL (unknown error) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,570 >>>>>>> INFO >>>>>>> >>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181 >>>>>>> <http://10.0.0.3:2181/ <http://10.0.0.3:2181/>>)] >>>>>>> zookeeper.ClientCnxn: Socket >>>>>>> connection established to 10.0.0.3/10.0.0.3:2181 >>>>>>> <http://10.0.0.3/10.0.0.3:2181 <http://10.0.0.3/10.0.0.3:2181>>, >>>>>>> initiating session >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,572 >>>>>>> INFO >>>>>>> >>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181 >>>>>>> <http://10.0.0.3:2181/ <http://10.0.0.3:2181/>>)] >>>>>>> zookeeper.ClientCnxn: >>>>>>> Session establishment complete on server 10.0.0.3/10.0.0.3:2181 >>>>>>> <http://10.0.0.3/10.0.0.3:2181 <http://10.0.0.3/10.0.0.3:2181>>, >>>>>>> sessionid = 0x30000e039e01c7f, >>>>>>> negotiated timeout = 40000 >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,628 >>>>>>> INFO >>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020] >>>>>>> index.PhoenixIndexFailurePolicy: Successfully >>>>>>> update INDEX_DISABLE_TIMESTAMP for IDX_MARK_O due to an >>>>>>> exception while writing updates. indexState=PENDING_DISABLE >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: >>>>>>> >>>>>>> org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException: >>>>>>> disableIndexOnFailure=true, Failed to write to multiple index >>>>>>> tables: [IDX_MARK_O] >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:235) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:620) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:595) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:578) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3646) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3108) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3050) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2405) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: at >>>>>>> >>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277) >>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,632 >>>>>>> INFO >>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020] >>>>>>> util.IndexManagementUtil: >>>>>>> Rethrowing org.apache.hadoop.hbase.DoNotRetryIOException: ERROR >>>>>>> 1121 (XCL21): Write to the index failed. >>>>>>> disableIndexOnFailure=true, Failed to write to multiple index >>>>>>> tables: [IDX_MARK_O] ,serverTimestamp=1537463364504, >>>>>>> >>>>>>> >>>>>>>> On 20 Sep 2018, at 21:01, Batyrshin Alexander >>>>>>>> <0x62...@gmail.com <mailto:0x62...@gmail.com> >>>>>>>> <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote: >>>>>>>> >>>>>>>> Our setup: >>>>>>>> HBase-1.4.7 >>>>>>>> Phoenix-4.14-hbase-1.4 >>>>>>>> >>>>>>>> >>>>>>>>> On 20 Sep 2018, at 20:19, Batyrshin Alexander >>>>>>>>> <0x62...@gmail.com <mailto:0x62...@gmail.com> >>>>>>>>> <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote: >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> Looks live we got dead lock with repeating "ERROR 1120 >>>>>>>>> (XCL20)" exception. At this time all indexes is ACTIVE. >>>>>>>>> Can you help to make deeper diagnose? >>>>>>>>> >>>>>>>>> java.sql.SQLException: ERROR 1120 (XCL20): Writes to table >>>>>>>>> blocked until index can be updated. tableName=TBL_MARK >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:494) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.phoenix.execute.MutationState.validateAndGetServerTimestamp(MutationState.java:815) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.phoenix.execute.MutationState.validateAll(MutationState.java:789) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.phoenix.execute.MutationState.send(MutationState.java:981) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.phoenix.execute.MutationState.send(MutationState.java:1514) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:1337) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:670) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:666) >>>>>>>>> at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) >>>>>>>>> at >>>>>>>>> >>>>>>>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:666) >>>>>>>>> at >>>>>>>>> >>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$doUpsert$1(PhoenixDao.scala:103) >>>>>>>>> at scala.util.Try$.apply(Try.scala:209) >>>>>>>>> at >>>>>>>>> x.persistence.phoenix.PhoenixDao.doUpsert(PhoenixDao.scala:101) >>>>>>>>> at >>>>>>>>> >>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$2(PhoenixDao.scala:45) >>>>>>>>> at >>>>>>>>> >>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$2$adapted(PhoenixDao.scala:45) >>>>>>>>> at scala.collection.immutable.Stream.flatMap(Stream.scala:486) >>>>>>>>> at >>>>>>>>> >>>>>>>>> scala.collection.immutable.Stream.$anonfun$flatMap$1(Stream.scala:494) >>>>>>>>> at >>>>>>>>> >>>>>>>>> scala.collection.immutable.Stream.$anonfun$append$1(Stream.scala:252) >>>>>>>>> at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1169) >>>>>>>>> at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1159) >>>>>>>>> at scala.collection.immutable.Stream.length(Stream.scala:309) >>>>>>>>> at scala.collection.SeqLike.size(SeqLike.scala:105) >>>>>>>>> at scala.collection.SeqLike.size$(SeqLike.scala:105) >>>>>>>>> at scala.collection.AbstractSeq.size(Seq.scala:41) >>>>>>>>> at >>>>>>>>> scala.collection.TraversableOnce.toArray(TraversableOnce.scala:285) >>>>>>>>> at >>>>>>>>> >>>>>>>>> scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:283) >>>>>>>>> at >>>>>>>>> scala.collection.AbstractTraversable.toArray(Traversable.scala:104) >>>>>>>>> at >>>>>>>>> >>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$1(PhoenixDao.scala:45) >>>>>>>>> at scala.util.Try$.apply(Try.scala:209) >>>>>>>>> at >>>>>>>>> x.persistence.phoenix.PhoenixDao.batchInsert(PhoenixDao.scala:45) >>>>>>>>> at >>>>>>>>> >>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$insert$2(PhoenixDao.scala:35) >>>>>>>>> at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:655) >>>>>>>>> at scala.util.Success.$anonfun$map$1(Try.scala:251) >>>>>>>>> at scala.util.Success.map(Try.scala:209) >>>>>>>>> at scala.concurrent.Future.$anonfun$map$1(Future.scala:289) >>>>>>>>> at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29) >>>>>>>>> at >>>>>>>>> >>>>>>>>> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29) >>>>>>>>> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60) >>>>>>>>> at >>>>>>>>> >>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >>>>>>>>> at >>>>>>>>> >>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >>>>>>>>> at java.lang.Thread.run(Thread.java:748) >