Re: Table dead lock: ERROR 1120 (XCL20): Writes to table blocked until index can be updated

Batyrshin Alexander Tue, 09 Oct 2018 02:31:19 -0700

I've created bug with reproduce steps: 
https://issues.apache.org/jira/browse/PHOENIX-4960


> On 3 Oct 2018, at 21:06, Batyrshin Alexander <0x62...@gmail.com> wrote:
> 
> But we see that Phoenix commit() in our cases fails with "ERROR 1120 (XCL20): 
> Writes to table blocked until index can be updated" because of 
> org.apache.hadoop.hbase.NotServingRegionException.
> Expected that there must be retry and success for commit()
> 
>> On 2 Oct 2018, at 22:02, Josh Elser <els...@apache.org 
>> <mailto:els...@apache.org>> wrote:
>> 
>> HBase will invalidate the location of a Region on seeing certain exceptions 
>> (including NotServingRegionException). After it sees the exception you have 
>> copied below, it should re-fetch the location of the Region.
>> 
>> If HBase keeps trying to access a Region on a RS that isn't hosting it, 
>> either hbase:meta is wrong or the HBase client has a bug.
>> 
>> However, to the point here, if that region was split successfully, clients 
>> should not be reading from that region anymore -- they would read from the 
>> daughters of that split region.
>> 
>> On 10/2/18 2:34 PM, Batyrshin Alexander wrote:
>>> We tried branch 4.14-HBase-1.4 at commit 
>>> https://github.com/apache/phoenix/commit/52893c240e4f24e2bfac0834d35205f866c16ed8
>>>  
>>> <https://github.com/apache/phoenix/commit/52893c240e4f24e2bfac0834d35205f866c16ed8>
>>> Is there any way to invalidate meta-cache on event of index regions split? 
>>> Maybe there is some option to set max time to live for cache?
>>> Watching this on regions servers:
>>> At 09:34 regions *96c3ede1c40c98959e60bd6fc0e07269* split on prod019
>>> Oct 02 09:34:39 prod019 hbase[152127]: 2018-10-02 09:34:39,719 INFO   
>>> [regionserver/prod019/10.0.0.19:60020-splits-1538462079117] 
>>> regionserver.SplitRequest: Region split, hbase:meta updated, and report to 
>>> master. Parent=IDX_MARK_O,\x0B\x0000000046200020qC8kovh\x00\x01\x80\x00\x
>>> 01e\x89\x8B\x99@\x00\x00\x00\x00,1537400033958.*96c3ede1c40c98959e60bd6fc0e07269*.,
>>>  new regions: 
>>> IDX_MARK_O,\x0B\x0000000046200020qC8kovh\x00\x01\x80\x00\x01e\x89\x8B\x99@\x00\x00\x00\x00,1538462079161.80fc2516619d8665789b0c5a2bca8a8b.,
>>>  IDX_MARK_O,\x0BON_SCHFDOPPR_2AL-5602
>>> 2B7D-2F90-4AA5-8125-4F4001B5BE0D-00000_2AL-C0D76C01-EE7E-496B-BCD6-F6488956F75A-00000_20180228_7E372181-F23D-4EBE-9CAD-5F5218C9798I\x0000000046186195_5.UHQ=\x00\x02\x80\x00\x01a\xD3\xEA@\x80\x00\x00\x00\x00,1538462079161.24b6675d9e51067a21e58f294a9f816b..
>>>  Split took 0sec
>>> Fail at 11:51 prod018
>>> Oct 02 11:51:13 prod018 hbase[108476]: 2018-10-02 11:51:13,752 WARN   
>>> [hconnection-0x4131af19-shared--pool24-t26652] client.AsyncProcess: #164, 
>>> table=IDX_MARK_O, attempt=1/1 failed=1ops, last exception: 
>>> org.apache.hadoop.hbase.NotServingRegionException: 
>>> org.apache.hadoop.hbase.NotServingRegionException: Region 
>>> IDX_MARK_O,\x0B\x0000000046200020qC8kovh\x00\x01\x80\x00\x01e\x89\x8B\x99@\x00\x00\x00\x00,1537400033958.*96c3ede1c40c98959e60bd6fc0e07269*.
>>>  is not online on prod019,60020,1538417663874
>>> Fail at 13:38 on prod005
>>> Oct 02 13:38:06 prod005 hbase[197079]: 2018-10-02 13:38:06,040 WARN   
>>> [hconnection-0x5e744e65-shared--pool8-t31214] client.AsyncProcess: #53, 
>>> table=IDX_MARK_O, attempt=1/1 failed=11ops, last exception: 
>>> org.apache.hadoop.hbase.NotServingRegionException: 
>>> org.apache.hadoop.hbase.NotServingRegionException: Region 
>>> IDX_MARK_O,\x0B\x0000000046200020qC8kovh\x00\x01\x80\x00\x01e\x89\x8B\x99@\x00\x00\x00\x00,1537400033958.*96c3ede1c40c98959e60bd6fc0e07269*.
>>>  is not online on prod019,60020,1538417663874
>>>> On 27 Sep 2018, at 01:04, Ankit Singhal <ankitsingha...@gmail.com 
>>>> <mailto:ankitsingha...@gmail.com> <mailto:ankitsingha...@gmail.com 
>>>> <mailto:ankitsingha...@gmail.com>>> wrote:
>>>> 
>>>> You might be hitting PHOENIX-4785 
>>>> <https://jira.apache.org/jira/browse/PHOENIX-4785 
>>>> <https://jira.apache.org/jira/browse/PHOENIX-4785>>,  you can apply the 
>>>> patch on top of 4.14 and see if it fixes your problem.
>>>> 
>>>> Regards,
>>>> Ankit Singhal
>>>> 
>>>> On Wed, Sep 26, 2018 at 2:33 PM Batyrshin Alexander <0x62...@gmail.com 
>>>> <mailto:0x62...@gmail.com> <mailto:0x62...@gmail.com 
>>>> <mailto:0x62...@gmail.com>>> wrote:
>>>> 
>>>>    Any advices? Helps?
>>>>    I can reproduce problem and capture more logs if needed.
>>>> 
>>>>>    On 21 Sep 2018, at 02:13, Batyrshin Alexander <0x62...@gmail.com 
>>>>> <mailto:0x62...@gmail.com>
>>>>>    <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote:
>>>>> 
>>>>>    Looks like lock goes away 30 minutes after index region split.
>>>>>    So i can assume that this issue comes from cache that configured
>>>>>    by this option:*phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs*
>>>>> 
>>>>> 
>>>>> 
>>>>>>    On 21 Sep 2018, at 00:15, Batyrshin Alexander <0x62...@gmail.com 
>>>>>> <mailto:0x62...@gmail.com>
>>>>>>    <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote:
>>>>>> 
>>>>>>    And how this split looks at Master logs:
>>>>>> 
>>>>>>    Sep 20 19:45:04 prod001 hbase[10838]: 2018-09-20 19:45:04,888
>>>>>>    INFO  [AM.ZK.Worker-pool5-t282] master.RegionStates: Transition
>>>>>>    {3e44b85ddf407da831dbb9a871496986 state=OPEN,
>>>>>>    ts=1537304859509, server=prod013,60020,1537304282885} to
>>>>>>    {3e44b85ddf407da831dbb9a871496986 state=SPLITTING,
>>>>>>    ts=1537461904888, server=prod
>>>>>>    Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340
>>>>>>    INFO  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
>>>>>>    {3e44b85ddf407da831dbb9a871496986
>>>>>>    state=SPLITTING, ts=1537461905340,
>>>>>>    server=prod013,60020,1537304282885} to
>>>>>>    {3e44b85ddf407da831dbb9a871496986 state=SPLIT, ts=1537461905340,
>>>>>>    server=pro
>>>>>>    Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340
>>>>>>    INFO  [AM.ZK.Worker-pool5-t284] master.RegionStates: Offlined
>>>>>>    3e44b85ddf407da831dbb9a871496986 from prod013,60020,1537304282885
>>>>>>    Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341
>>>>>>    INFO  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
>>>>>>    {33cba925c7acb347ac3f5e70e839c3cb
>>>>>>    state=SPLITTING_NEW, ts=1537461905340,
>>>>>>    server=prod013,60020,1537304282885} to
>>>>>>    {33cba925c7acb347ac3f5e70e839c3cb state=OPEN, ts=1537461905341,
>>>>>>    server=
>>>>>>    Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341
>>>>>>    INFO  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
>>>>>>    {acb8f16a004a894c8706f6e12cd26144
>>>>>>    state=SPLITTING_NEW, ts=1537461905340,
>>>>>>    server=prod013,60020,1537304282885} to
>>>>>>    {acb8f16a004a894c8706f6e12cd26144 state=OPEN, ts=1537461905341,
>>>>>>    server=
>>>>>>    Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,343
>>>>>>    INFO  [AM.ZK.Worker-pool5-t284] master.AssignmentManager:
>>>>>>    Handled SPLIT
>>>>>>    event; 
>>>>>> parent=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
>>>>>>    daughter a=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1
>>>>>>    Sep 20 19:47:41 prod001 hbase[10838]: 2018-09-20 19:47:41,972
>>>>>>    INFO  [prod001,60000,1537304851459_ChoreService_2]
>>>>>>    balancer.StochasticLoadBalancer: Skipping load balancing because
>>>>>>    balanced cluster; total cost is 17.82282205608522, sum
>>>>>>    multiplier is 1102.0 min cost which need balance is 0.05
>>>>>>    Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,021
>>>>>>    INFO  [prod001,60000,1537304851459_ChoreService_1]
>>>>>>    hbase.MetaTableAccessor:
>>>>>>    Deleted 
>>>>>> IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
>>>>>>    Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,022
>>>>>>    INFO  [prod001,60000,1537304851459_ChoreService_1]
>>>>>>    master.CatalogJanitor: Scanned 779 catalog row(s), gc'd 0
>>>>>>    unreferenced merged region(s) and 1 unreferenced parent region(s)
>>>>>> 
>>>>>>>    On 20 Sep 2018, at 21:43, Batyrshin Alexander
>>>>>>>    <0x62...@gmail.com <mailto:0x62...@gmail.com> 
>>>>>>> <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote:
>>>>>>> 
>>>>>>>    Looks like problem was because of index region split
>>>>>>> 
>>>>>>>    Index region split at prod013:
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441
>>>>>>>    INFO
>>>>>>>     [regionserver/prod013/10.0.0.13:60020-splits-1537400010677]
>>>>>>>    regionserver.SplitRequest: Region split, hbase:meta updated,
>>>>>>>    and report to master.
>>>>>>>    
>>>>>>> Parent=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
>>>>>>>    new
>>>>>>>    regions: 
>>>>>>> IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1537461904877.33cba925c7acb347ac3f5e70e839c3cb.,
>>>>>>>  
>>>>>>> IDX_MARK_O,\x107834005168\x0000000046200068=4YF!YI,1537461904877.acb8f16a004a894c8706f6e12cd26144..
>>>>>>>    Split took 0sec
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441
>>>>>>>    INFO
>>>>>>>     [regionserver/prod013/10.0.0.13:60020-splits-1537400010677]
>>>>>>>    regionserver.SplitRequest: Split transaction journal:
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:         STARTED at
>>>>>>>    1537461904853
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:         PREPARED at
>>>>>>>    1537461904877
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:            
>>>>>>> BEFORE_PRE_SPLIT_HOOK at 1537461904877
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:            
>>>>>>> AFTER_PRE_SPLIT_HOOK at 1537461904877
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:         SET_SPLITTING at
>>>>>>>    1537461904880
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:         CREATE_SPLIT_DIR
>>>>>>>    at 1537461904987
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:            
>>>>>>> CLOSED_PARENT_REGION at 1537461905002
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:         OFFLINED_PARENT
>>>>>>>    at 1537461905002
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:            
>>>>>>> STARTED_REGION_A_CREATION at 1537461905056
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:            
>>>>>>> STARTED_REGION_B_CREATION at 1537461905131
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:         PONR at
>>>>>>>    1537461905192
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:         OPENED_REGION_A
>>>>>>>    at 1537461905249
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:         OPENED_REGION_B
>>>>>>>    at 1537461905252
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:            
>>>>>>> BEFORE_POST_SPLIT_HOOK at 1537461905439
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:            
>>>>>>> AFTER_POST_SPLIT_HOOK at 1537461905439
>>>>>>>    Sep 20 19:45:05 prod013 hbase[193055]:         COMPLETED at
>>>>>>>    1537461905439
>>>>>>> 
>>>>>>>    Index update failed at prod002:
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,520
>>>>>>>    WARN  [hconnection-0x4f3242a0-shared--pool32-t36014]
>>>>>>>    client.AsyncProcess: #220, table=IDX_MARK_O, attempt=1/1
>>>>>>>    failed=1ops, last exception:
>>>>>>>    org.apache.hadoop.hbase.NotServingRegionException:
>>>>>>>    org.apache.hadoop.hbase.NotServingRegionException: Re
>>>>>>>    gion
>>>>>>>    
>>>>>>> IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
>>>>>>>    is not online on prod013,60020,1537304282885
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3081)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2365)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:  on
>>>>>>>    prod013,60020,1537304282885, tracking started Thu Sep 20
>>>>>>>    20:09:24 MSK 2018; not retrying 1 - final failure
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,549
>>>>>>>    INFO
>>>>>>>     [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
>>>>>>>    zookeeper.RecoverableZooKeeper: Process
>>>>>>>    identifier=hconnection-0x39beae45 connecting to ZooKeeper
>>>>>>>    ensemble=10.0.0.1:2181 <http://10.0.0.1:2181/ 
>>>>>>> <http://10.0.0.1:2181/>>,10.0.0.2:2181
>>>>>>>    <http://10.0.0.2:2181/ <http://10.0.0.2:2181/>>,10.0.0.3:2181 
>>>>>>> <http://10.0.0.3:2181/ <http://10.0.0.3:2181/>>
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,549
>>>>>>>    INFO
>>>>>>>     [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
>>>>>>>    zookeeper.ZooKeeper: Initiating client
>>>>>>>    connection, connectString=10.0.0.1:2181
>>>>>>>    <http://10.0.0.1:2181/ <http://10.0.0.1:2181/>>,10.0.0.2:2181
>>>>>>>    <http://10.0.0.2:2181/ <http://10.0.0.2:2181/>>,10.0.0.3:2181 
>>>>>>> <http://10.0.0.3:2181/ <http://10.0.0.3:2181/>>
>>>>>>>    sessionTimeout=90000
>>>>>>>    watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@3ef61f7
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,562
>>>>>>>    INFO
>>>>>>>     
>>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181
>>>>>>>    <http://10.0.0.3:2181/ <http://10.0.0.3:2181/>>)] 
>>>>>>> zookeeper.ClientCnxn: Opening
>>>>>>>    socket connection to server 10.0.0.3/10.0.0.3:2181
>>>>>>>    <http://10.0.0.3/10.0.0.3:2181 <http://10.0.0.3/10.0.0.3:2181>>. 
>>>>>>> Will not attempt to
>>>>>>>    authenticate using SASL (unknown error)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,570
>>>>>>>    INFO
>>>>>>>     
>>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181
>>>>>>>    <http://10.0.0.3:2181/ <http://10.0.0.3:2181/>>)] 
>>>>>>> zookeeper.ClientCnxn: Socket
>>>>>>>    connection established to 10.0.0.3/10.0.0.3:2181
>>>>>>>    <http://10.0.0.3/10.0.0.3:2181 <http://10.0.0.3/10.0.0.3:2181>>, 
>>>>>>> initiating session
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,572
>>>>>>>    INFO
>>>>>>>     
>>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181
>>>>>>>    <http://10.0.0.3:2181/ <http://10.0.0.3:2181/>>)] 
>>>>>>> zookeeper.ClientCnxn:
>>>>>>>    Session establishment complete on server 10.0.0.3/10.0.0.3:2181
>>>>>>>    <http://10.0.0.3/10.0.0.3:2181 <http://10.0.0.3/10.0.0.3:2181>>, 
>>>>>>> sessionid = 0x30000e039e01c7f,
>>>>>>>    negotiated timeout = 40000
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,628
>>>>>>>    INFO
>>>>>>>     [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
>>>>>>>    index.PhoenixIndexFailurePolicy: Successfully
>>>>>>>    update INDEX_DISABLE_TIMESTAMP for IDX_MARK_O due to an
>>>>>>>    exception while writing updates. indexState=PENDING_DISABLE
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:
>>>>>>>    
>>>>>>> org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException:
>>>>>>>     disableIndexOnFailure=true, Failed to write to multiple index
>>>>>>>    tables: [IDX_MARK_O]
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:235)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:620)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:595)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:578)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3646)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3108)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3050)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2405)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]:         at
>>>>>>>    
>>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
>>>>>>>    Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,632
>>>>>>>    INFO
>>>>>>>     [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020]
>>>>>>>    util.IndexManagementUtil:
>>>>>>>    Rethrowing org.apache.hadoop.hbase.DoNotRetryIOException: ERROR
>>>>>>>    1121 (XCL21): Write to the index failed.
>>>>>>>     disableIndexOnFailure=true, Failed to write to multiple index
>>>>>>>    tables: [IDX_MARK_O] ,serverTimestamp=1537463364504,
>>>>>>> 
>>>>>>> 
>>>>>>>>    On 20 Sep 2018, at 21:01, Batyrshin Alexander
>>>>>>>>    <0x62...@gmail.com <mailto:0x62...@gmail.com> 
>>>>>>>> <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote:
>>>>>>>> 
>>>>>>>>    Our setup:
>>>>>>>>    HBase-1.4.7
>>>>>>>>    Phoenix-4.14-hbase-1.4
>>>>>>>> 
>>>>>>>> 
>>>>>>>>>    On 20 Sep 2018, at 20:19, Batyrshin Alexander
>>>>>>>>>    <0x62...@gmail.com <mailto:0x62...@gmail.com> 
>>>>>>>>> <mailto:0x62...@gmail.com <mailto:0x62...@gmail.com>>> wrote:
>>>>>>>>> 
>>>>>>>>>     Hello,
>>>>>>>>>    Looks live we got dead lock with repeating "ERROR 1120
>>>>>>>>>    (XCL20)" exception. At this time all indexes is ACTIVE.
>>>>>>>>>    Can you help to make deeper diagnose?
>>>>>>>>> 
>>>>>>>>>    java.sql.SQLException: ERROR 1120 (XCL20): Writes to table
>>>>>>>>>    blocked until index can be updated. tableName=TBL_MARK
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:494)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> org.apache.phoenix.execute.MutationState.validateAndGetServerTimestamp(MutationState.java:815)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> org.apache.phoenix.execute.MutationState.validateAll(MutationState.java:789)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> org.apache.phoenix.execute.MutationState.send(MutationState.java:981)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> org.apache.phoenix.execute.MutationState.send(MutationState.java:1514)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:1337)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:670)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:666)
>>>>>>>>>    at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:666)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$doUpsert$1(PhoenixDao.scala:103)
>>>>>>>>>    at scala.util.Try$.apply(Try.scala:209)
>>>>>>>>>    at
>>>>>>>>>    x.persistence.phoenix.PhoenixDao.doUpsert(PhoenixDao.scala:101)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$2(PhoenixDao.scala:45)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$2$adapted(PhoenixDao.scala:45)
>>>>>>>>>    at scala.collection.immutable.Stream.flatMap(Stream.scala:486)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> scala.collection.immutable.Stream.$anonfun$flatMap$1(Stream.scala:494)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> scala.collection.immutable.Stream.$anonfun$append$1(Stream.scala:252)
>>>>>>>>>    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1169)
>>>>>>>>>    at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1159)
>>>>>>>>>    at scala.collection.immutable.Stream.length(Stream.scala:309)
>>>>>>>>>    at scala.collection.SeqLike.size(SeqLike.scala:105)
>>>>>>>>>    at scala.collection.SeqLike.size$(SeqLike.scala:105)
>>>>>>>>>    at scala.collection.AbstractSeq.size(Seq.scala:41)
>>>>>>>>>    at
>>>>>>>>>    scala.collection.TraversableOnce.toArray(TraversableOnce.scala:285)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:283)
>>>>>>>>>    at
>>>>>>>>>    scala.collection.AbstractTraversable.toArray(Traversable.scala:104)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$1(PhoenixDao.scala:45)
>>>>>>>>>    at scala.util.Try$.apply(Try.scala:209)
>>>>>>>>>    at
>>>>>>>>>    x.persistence.phoenix.PhoenixDao.batchInsert(PhoenixDao.scala:45)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$insert$2(PhoenixDao.scala:35)
>>>>>>>>>    at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:655)
>>>>>>>>>    at scala.util.Success.$anonfun$map$1(Try.scala:251)
>>>>>>>>>    at scala.util.Success.map(Try.scala:209)
>>>>>>>>>    at scala.concurrent.Future.$anonfun$map$1(Future.scala:289)
>>>>>>>>>    at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>>>>>>>>>    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>>>>>    at
>>>>>>>>>    
>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>>>>>    at java.lang.Thread.run(Thread.java:748)
>

Re: Table dead lock: ERROR 1120 (XCL20): Writes to table blocked until index can be updated

Reply via email to