Re: Table dead lock: ERROR 1120 (XCL20): Writes to table blocked until index can be updated

Batyrshin Alexander Mon, 01 Oct 2018 08:14:40 -0700

Yep, we tried restarting Master + all Region Servers.
hbase.hregion.max.filesize is just 10GB. Our CPU/Disk busy at 5% or even lesser



Just got another split:

Oct 01 17:24:09 prod001 hbase[18135]: 2018-10-01 17:24:09,964 INFO  
[prod001,60000,1538342034697_ChoreService_6] balancer.StochasticLoadBalancer: 
Skipping load balancing because balanced cluster; total cost is 
18.168898456852713, sum multiplier is 1102.0 min cost which need balance is 0.0
Oct 01 17:28:00 prod001 hbase[18135]: 2018-10-01 17:28:00,362 INFO  
[Idle-Rpc-Conn-Sweeper-pool2-t1] ipc.AbstractRpcClient: Cleanup idle connection 
to prod017/10.0.0.17:60020
Oct 01 17:28:08 prod001 hbase[18135]: 2018-10-01 17:28:08,252 INFO  
[AM.ZK.Worker-pool5-t175] master.RegionStates: Transition null to 
{3bf96cfad6bb33b6f8a0db6a242c2577 state=SPLITTING_NEW, ts=1538404088252, 
server=prod014,60020,1538344014594}
Oct 01 17:28:08 prod001 hbase[18135]: 2018-10-01 17:28:08,252 INFO  
[AM.ZK.Worker-pool5-t175] master.RegionStates: Transition null to 
{8669f928e67ee67e567400e7c011727f state=SPLITTING_NEW, ts=1538404088252, 
server=prod014,60020,1538344014594}
Oct 01 17:28:08 prod001 hbase[18135]: 2018-10-01 17:28:08,252 INFO  
[AM.ZK.Worker-pool5-t175] master.RegionStates: Transition 
{8277aec30576531134e36bbde39e372d state=OPEN, ts=1538344020332, 
server=prod014,60020,1538344014594} to {8277aec30576531134e36bbde39e372d 
state=SPLITTING, ts=153840
Oct 01 17:28:08 prod001 hbase[18135]: 2018-10-01 17:28:08,677 INFO  
[AM.ZK.Worker-pool5-t177] master.RegionStates: Transition 
{8277aec30576531134e36bbde39e372d state=SPLITTING, ts=1538404088677, 
server=prod014,60020,1538344014594} to {8277aec30576531134e36bbde39e372d 
state=SPLIT, ts=15384
Oct 01 17:28:08 prod001 hbase[18135]: 2018-10-01 17:28:08,677 INFO  
[AM.ZK.Worker-pool5-t177] master.RegionStates: Offlined 
8277aec30576531134e36bbde39e372d from prod014,60020,1538344014594
Oct 01 17:28:08 prod001 hbase[18135]: 2018-10-01 17:28:08,677 INFO  
[AM.ZK.Worker-pool5-t177] master.RegionStates: Transition 
{3bf96cfad6bb33b6f8a0db6a242c2577 state=SPLITTING_NEW, ts=1538404088677, 
server=prod014,60020,1538344014594} to {3bf96cfad6bb33b6f8a0db6a242c2577 
state=OPEN, ts=15
Oct 01 17:28:08 prod001 hbase[18135]: 2018-10-01 17:28:08,677 INFO  
[AM.ZK.Worker-pool5-t177] master.RegionStates: Transition 
{8669f928e67ee67e567400e7c011727f state=SPLITTING_NEW, ts=1538404088677, 
server=prod014,60020,1538344014594} to {8669f928e67ee67e567400e7c011727f 
state=OPEN, ts=15
Oct 01 17:28:08 prod001 hbase[18135]: 2018-10-01 17:28:08,678 INFO  
[AM.ZK.Worker-pool5-t177] master.AssignmentManager: Handled SPLIT event; 
parent=IDX_MARK_O,\x06\x0000000046200020r:N,8ef\x00\x00\x80\x00\x01d\xEBcIp\x00\x00\x00\x00,1537400027657.8277aec30576531134e36bbde39e372d.
Oct 01 17:29:10 prod001 hbase[18135]: 2018-10-01 17:29:10,039 INFO  
[prod001,60000,1538342034697_ChoreService_7] balancer.StochasticLoadBalancer: 
Skipping load balancing because balanced cluster; total cost is 
13.420884661878974, sum multiplier is 1102.0 min cost which need balance is 0.0
Oct 01 17:32:00 prod001 hbase[18135]: 2018-10-01 17:32:00,363 INFO  
[Idle-Rpc-Conn-Sweeper-pool2-t1] ipc.AbstractRpcClient: Cleanup idle connection 
to prod017/10.0.0.17:60020
Oct 01 17:34:09 prod001 hbase[18135]: 2018-10-01 17:34:09,957 INFO  
[prod001,60000,1538342034697_ChoreService_1] balancer.StochasticLoadBalancer: 
Skipping load balancing because balanced cluster; total cost is 
13.623502266311172, sum multiplier is 1102.0 min cost which need balance is 0.0
Oct 01 17:34:10 prod001 hbase[18135]: 2018-10-01 17:34:10,059 INFO  
[prod001,60000,1538342034697_ChoreService_2] hbase.MetaTableAccessor: Deleted 
IDX_MARK_O,\x06\x0000000046200020r:N,8ef\x00\x00\x80\x00\x01d\xEBcIp\x00\x00\x00\x00,1537400027657.8277aec30576531134e36bbde39e372d.
Oct 01 17:34:10 prod001 hbase[18135]: 2018-10-01 17:34:10,060 INFO  
[prod001,60000,1538342034697_ChoreService_2] master.CatalogJanitor: Scanned 791 
catalog row(s), gc'd 0 unreferenced merged region(s) and 1 unreferenced parent 
region(s)


Master start time:
HMaster Start Time      Mon Oct 01 00:13:54 MSK 2018

RServer start time:
RS Start Time   Mon Oct 01 00:46:54 MSK 2018

Per table config:
hbase(main):128:0> describe 'IDX_MARK_O'
Table IDX_MARK_O is ENABLED
IDX_MARK_O, {TABLE_ATTRIBUTES => {PRIORITY => '1000', coprocessor$1 => 
'|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 
=> 
'|org.apache.phoenix.coprocessor.UngroupedAggregateRegionObserver|805306366|', 
coprocessor$3 => 
'|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', 
coprocessor$4 => 
'|org.apache.phoenix.coprocessor.ServerCachingEndpointImpl|805306366|', 
METADATA => {'DATA_TABLE_NAME' => 'MARK', 'SPLIT_POLICY' => 
'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'}}
COLUMN FAMILIES DESCRIPTION
{NAME => 'd', BLOOMFILTER => 'NONE', VERSIONS => '1', IN_MEMORY => 'false', 
KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIFF', TTL => 
'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', 
BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.3340 seconds

Per HBase config:
<property>
<name>hbase.regionserver.region.split.policy</name>
<value>
org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy
</value>
<source>hbase-site.xml</source>
</property>
<property>
<name>hbase.hregion.max.filesize</name>
<value>10737418240</value>
<source>hbase-default.xml</source>
</property>



> On 30 Sep 2018, at 06:09, Jaanai Zhang <cloud.pos...@gmail.com> wrote:
> 
> Did you restart the cluster and you should set 'hbase.hregion.max.filesize' 
> to a safeguard value which less than RS's capabilities.
> 
> ----------------------------------------
>    Jaanai Zhang
>    Best regards!
> 
> 
> 
> Batyrshin Alexander <0x62...@gmail.com <mailto:0x62...@gmail.com>> 
> 于2018年9月29日周六 下午5:28写道：
> Meanwhile we tried to disable regions split via per index table options
> 'SPLIT_POLICY' => 
> 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'  and 
> hbase.hregion.max.filesize = 10737418240
> Looks like this options set doesn't. Some regions splits at size < 2GB
> 
> Then we tried to disable all splits via hbase shell: splitormerge_switch 
> 'SPLIT', false
> Seems that this also doesn't work.
> 
> Any ideas why we can't disable regions split?
> 
>> On 27 Sep 2018, at 02:52, Vincent Poon <vincent.poon...@gmail.com 
>> <mailto:vincent.poon...@gmail.com>> wrote:
>> 
>> We are planning a Phoenix 4.14.1 release which will have this fix
>> 
>> On Wed, Sep 26, 2018 at 3:36 PM Batyrshin Alexander <0x62...@gmail.com 
>> <mailto:0x62...@gmail.com>> wrote:
>> Thank you. We will try somehow...
>> Is there any chance that this fix will be included in next release for 
>> HBASE-1.4 (not 2.0)?
>> 
>>> On 27 Sep 2018, at 01:04, Ankit Singhal <ankitsingha...@gmail.com 
>>> <mailto:ankitsingha...@gmail.com>> wrote:
>>> 
>>> You might be hitting PHOENIX-4785 
>>> <https://jira.apache.org/jira/browse/PHOENIX-4785>,  you can apply the 
>>> patch on top of 4.14 and see if it fixes your problem.
>>> 
>>> Regards,
>>> Ankit Singhal
>>> 
>>> On Wed, Sep 26, 2018 at 2:33 PM Batyrshin Alexander <0x62...@gmail.com 
>>> <mailto:0x62...@gmail.com>> wrote:
>>> Any advices? Helps?
>>> I can reproduce problem and capture more logs if needed.
>>> 
>>>> On 21 Sep 2018, at 02:13, Batyrshin Alexander <0x62...@gmail.com 
>>>> <mailto:0x62...@gmail.com>> wrote:
>>>> 
>>>> Looks like lock goes away 30 minutes after index region split.
>>>> So i can assume that this issue comes from cache that configured by this 
>>>> option: phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs
>>>> 
>>>> 
>>>> 
>>>>> On 21 Sep 2018, at 00:15, Batyrshin Alexander <0x62...@gmail.com 
>>>>> <mailto:0x62...@gmail.com>> wrote:
>>>>> 
>>>>> And how this split looks at Master logs:
>>>>> 
>>>>> Sep 20 19:45:04 prod001 hbase[10838]: 2018-09-20 19:45:04,888 INFO  
>>>>> [AM.ZK.Worker-pool5-t282] master.RegionStates: Transition 
>>>>> {3e44b85ddf407da831dbb9a871496986 state=OPEN, ts=1537304859509, 
>>>>> server=prod013,60020,1537304282885} to {3e44b85ddf407da831dbb9a871496986 
>>>>> state=SPLITTING, ts=1537461904888, server=prod
>>>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO  
>>>>> [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition 
>>>>> {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, ts=1537461905340, 
>>>>> server=prod013,60020,1537304282885} to {3e44b85ddf407da831dbb9a871496986 
>>>>> state=SPLIT, ts=1537461905340, server=pro
>>>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO  
>>>>> [AM.ZK.Worker-pool5-t284] master.RegionStates: Offlined 
>>>>> 3e44b85ddf407da831dbb9a871496986 from prod013,60020,1537304282885
>>>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO  
>>>>> [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition 
>>>>> {33cba925c7acb347ac3f5e70e839c3cb state=SPLITTING_NEW, ts=1537461905340, 
>>>>> server=prod013,60020,1537304282885} to {33cba925c7acb347ac3f5e70e839c3cb 
>>>>> state=OPEN, ts=1537461905341, server=
>>>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO  
>>>>> [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition 
>>>>> {acb8f16a004a894c8706f6e12cd26144 state=SPLITTING_NEW, ts=1537461905340, 
>>>>> server=prod013,60020,1537304282885} to {acb8f16a004a894c8706f6e12cd26144 
>>>>> state=OPEN, ts=1537461905341, server=
>>>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,343 INFO  
>>>>> [AM.ZK.Worker-pool5-t284] master.AssignmentManager: Handled SPLIT event; 
>>>>> parent=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
>>>>>  daughter a=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1
>>>>> Sep 20 19:47:41 prod001 hbase[10838]: 2018-09-20 19:47:41,972 INFO  
>>>>> [prod001,60000,1537304851459_ChoreService_2] 
>>>>> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
>>>>> cluster; total cost is 17.82282205608522, sum multiplier is 1102.0 min 
>>>>> cost which need balance is 0.05
>>>>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,021 INFO  
>>>>> [prod001,60000,1537304851459_ChoreService_1] hbase.MetaTableAccessor: 
>>>>> Deleted 
>>>>> IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
>>>>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,022 INFO  
>>>>> [prod001,60000,1537304851459_ChoreService_1] master.CatalogJanitor: 
>>>>> Scanned 779 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 
>>>>> unreferenced parent region(s)
>>>>> 
>>>>>> On 20 Sep 2018, at 21:43, Batyrshin Alexander <0x62...@gmail.com 
>>>>>> <mailto:0x62...@gmail.com>> wrote:
>>>>>> 
>>>>>> Looks like problem was because of index region split
>>>>>> 
>>>>>> Index region split at prod013:
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO  
>>>>>> [regionserver/prod013/10.0.0.13:60020-splits-1537400010677] 
>>>>>> regionserver.SplitRequest: Region split, hbase:meta updated, and report 
>>>>>> to master. 
>>>>>> Parent=IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
>>>>>>  new regions: 
>>>>>> IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1537461904877.33cba925c7acb347ac3f5e70e839c3cb.,
>>>>>>  
>>>>>> IDX_MARK_O,\x107834005168\x0000000046200068=4YF!YI,1537461904877.acb8f16a004a894c8706f6e12cd26144..
>>>>>>  Split took 0sec
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO  
>>>>>> [regionserver/prod013/10.0.0.13:60020-splits-1537400010677] 
>>>>>> regionserver.SplitRequest: Split transaction journal:
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         STARTED at 1537461904853
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         PREPARED at 1537461904877
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         BEFORE_PRE_SPLIT_HOOK at 
>>>>>> 1537461904877
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         AFTER_PRE_SPLIT_HOOK at 
>>>>>> 1537461904877
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         SET_SPLITTING at 
>>>>>> 1537461904880
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         CREATE_SPLIT_DIR at 
>>>>>> 1537461904987
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         CLOSED_PARENT_REGION at 
>>>>>> 1537461905002
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         OFFLINED_PARENT at 
>>>>>> 1537461905002
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         STARTED_REGION_A_CREATION 
>>>>>> at 1537461905056
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         STARTED_REGION_B_CREATION 
>>>>>> at 1537461905131
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         PONR at 1537461905192
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         OPENED_REGION_A at 
>>>>>> 1537461905249
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         OPENED_REGION_B at 
>>>>>> 1537461905252
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         BEFORE_POST_SPLIT_HOOK at 
>>>>>> 1537461905439
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         AFTER_POST_SPLIT_HOOK at 
>>>>>> 1537461905439
>>>>>> Sep 20 19:45:05 prod013 hbase[193055]:         COMPLETED at 1537461905439
>>>>>> 
>>>>>> Index update failed at prod002:
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,520 WARN  
>>>>>> [hconnection-0x4f3242a0-shared--pool32-t36014] client.AsyncProcess: 
>>>>>> #220, table=IDX_MARK_O, attempt=1/1 failed=1ops, last exception: 
>>>>>> org.apache.hadoop.hbase.NotServingRegionException: 
>>>>>> org.apache.hadoop.hbase.NotServingRegionException: Re
>>>>>> gion 
>>>>>> IDX_MARK_O,\x107834005168\x0000000046200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
>>>>>>  is not online on prod013,60020,1537304282885
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3081)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1271)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2365)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:  on prod013,60020,1537304282885, 
>>>>>> tracking started Thu Sep 20 20:09:24 MSK 2018; not retrying 1 - final 
>>>>>> failure
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,549 INFO  
>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020] 
>>>>>> zookeeper.RecoverableZooKeeper: Process 
>>>>>> identifier=hconnection-0x39beae45 connecting to ZooKeeper 
>>>>>> ensemble=10.0.0.1:2181 <http://10.0.0.1:2181/>,10.0.0.2:2181 
>>>>>> <http://10.0.0.2:2181/>,10.0.0.3:2181 <http://10.0.0.3:2181/>
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,549 INFO  
>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020] 
>>>>>> zookeeper.ZooKeeper: Initiating client connection, 
>>>>>> connectString=10.0.0.1:2181 <http://10.0.0.1:2181/>,10.0.0.2:2181 
>>>>>> <http://10.0.0.2:2181/>,10.0.0.3:2181 <http://10.0.0.3:2181/> 
>>>>>> sessionTimeout=90000 
>>>>>> watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@3ef61f7
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,562 INFO  
>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181
>>>>>>  <http://10.0.0.3:2181/>)] zookeeper.ClientCnxn: Opening socket 
>>>>>> connection to server 10.0.0.3/10.0.0.3:2181 
>>>>>> <http://10.0.0.3/10.0.0.3:2181>. Will not attempt to authenticate using 
>>>>>> SASL (unknown error)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,570 INFO  
>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181
>>>>>>  <http://10.0.0.3:2181/>)] zookeeper.ClientCnxn: Socket connection 
>>>>>> established to 10.0.0.3/10.0.0.3:2181 <http://10.0.0.3/10.0.0.3:2181>, 
>>>>>> initiating session
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,572 INFO  
>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020-SendThread(10.0.0.3:2181
>>>>>>  <http://10.0.0.3:2181/>)] zookeeper.ClientCnxn: Session establishment 
>>>>>> complete on server 10.0.0.3/10.0.0.3:2181 
>>>>>> <http://10.0.0.3/10.0.0.3:2181>, sessionid = 0x30000e039e01c7f, 
>>>>>> negotiated timeout = 40000
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,628 INFO  
>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020] 
>>>>>> index.PhoenixIndexFailurePolicy: Successfully update 
>>>>>> INDEX_DISABLE_TIMESTAMP for IDX_MARK_O due to an exception while writing 
>>>>>> updates. indexState=PENDING_DISABLE
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 
>>>>>> org.apache.phoenix.hbase.index.exception.MultiIndexWriteFailureException:
>>>>>>   disableIndexOnFailure=true, Failed to write to multiple index tables: 
>>>>>> [IDX_MARK_O]
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.phoenix.hbase.index.write.TrackingParallelWriterIndexCommitter.write(TrackingParallelWriterIndexCommitter.java:235)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.phoenix.hbase.index.write.IndexWriter.write(IndexWriter.java:195)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:156)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.phoenix.hbase.index.write.IndexWriter.writeAndKillYourselfOnFailure(IndexWriter.java:145)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.phoenix.hbase.index.Indexer.doPostWithExceptions(Indexer.java:620)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.phoenix.hbase.index.Indexer.doPost(Indexer.java:595)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.phoenix.hbase.index.Indexer.postBatchMutateIndispensably(Indexer.java:578)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1048)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1711)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1745)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postBatchMutateIndispensably(RegionCoprocessorHost.java:1044)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3646)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3108)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3050)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:916)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:844)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2405)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36621)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2359)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]:         at 
>>>>>> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)
>>>>>> Sep 20 20:09:24 prod002 hbase[97285]: 2018-09-20 20:09:24,632 INFO  
>>>>>> [RpcServer.default.FPBQ.Fifo.handler=98,queue=8,port=60020] 
>>>>>> util.IndexManagementUtil: Rethrowing 
>>>>>> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 1121 (XCL21): Write 
>>>>>> to the index failed.  disableIndexOnFailure=true, Failed to write to 
>>>>>> multiple index tables: [IDX_MARK_O] ,serverTimestamp=1537463364504,
>>>>>> 
>>>>>> 
>>>>>>> On 20 Sep 2018, at 21:01, Batyrshin Alexander <0x62...@gmail.com 
>>>>>>> <mailto:0x62...@gmail.com>> wrote:
>>>>>>> 
>>>>>>> Our setup:
>>>>>>> HBase-1.4.7
>>>>>>> Phoenix-4.14-hbase-1.4
>>>>>>> 
>>>>>>> 
>>>>>>>> On 20 Sep 2018, at 20:19, Batyrshin Alexander <0x62...@gmail.com 
>>>>>>>> <mailto:0x62...@gmail.com>> wrote:
>>>>>>>> 
>>>>>>>>  Hello,
>>>>>>>> Looks live we got dead lock with repeating "ERROR 1120 (XCL20)" 
>>>>>>>> exception. At this time all indexes is ACTIVE.
>>>>>>>> Can you help to make deeper diagnose?
>>>>>>>> 
>>>>>>>> java.sql.SQLException: ERROR 1120 (XCL20): Writes to table blocked 
>>>>>>>> until index can be updated. tableName=TBL_MARK
>>>>>>>>        at 
>>>>>>>> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:494)
>>>>>>>>        at 
>>>>>>>> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
>>>>>>>>        at 
>>>>>>>> org.apache.phoenix.execute.MutationState.validateAndGetServerTimestamp(MutationState.java:815)
>>>>>>>>        at 
>>>>>>>> org.apache.phoenix.execute.MutationState.validateAll(MutationState.java:789)
>>>>>>>>        at 
>>>>>>>> org.apache.phoenix.execute.MutationState.send(MutationState.java:981)
>>>>>>>>        at 
>>>>>>>> org.apache.phoenix.execute.MutationState.send(MutationState.java:1514)
>>>>>>>>        at 
>>>>>>>> org.apache.phoenix.execute.MutationState.commit(MutationState.java:1337)
>>>>>>>>        at 
>>>>>>>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:670)
>>>>>>>>        at 
>>>>>>>> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:666)
>>>>>>>>        at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>>>>>>>>        at 
>>>>>>>> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:666)
>>>>>>>>        at 
>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$doUpsert$1(PhoenixDao.scala:103)
>>>>>>>>        at scala.util.Try$.apply(Try.scala:209)
>>>>>>>>        at 
>>>>>>>> x.persistence.phoenix.PhoenixDao.doUpsert(PhoenixDao.scala:101)
>>>>>>>>        at 
>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$2(PhoenixDao.scala:45)
>>>>>>>>        at 
>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$2$adapted(PhoenixDao.scala:45)
>>>>>>>>        at scala.collection.immutable.Stream.flatMap(Stream.scala:486)
>>>>>>>>        at 
>>>>>>>> scala.collection.immutable.Stream.$anonfun$flatMap$1(Stream.scala:494)
>>>>>>>>        at 
>>>>>>>> scala.collection.immutable.Stream.$anonfun$append$1(Stream.scala:252)
>>>>>>>>        at 
>>>>>>>> scala.collection.immutable.Stream$Cons.tail(Stream.scala:1169)
>>>>>>>>        at 
>>>>>>>> scala.collection.immutable.Stream$Cons.tail(Stream.scala:1159)
>>>>>>>>        at scala.collection.immutable.Stream.length(Stream.scala:309)
>>>>>>>>        at scala.collection.SeqLike.size(SeqLike.scala:105)
>>>>>>>>        at scala.collection.SeqLike.size$(SeqLike.scala:105)
>>>>>>>>        at scala.collection.AbstractSeq.size(Seq.scala:41)
>>>>>>>>        at 
>>>>>>>> scala.collection.TraversableOnce.toArray(TraversableOnce.scala:285)
>>>>>>>>        at 
>>>>>>>> scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:283)
>>>>>>>>        at 
>>>>>>>> scala.collection.AbstractTraversable.toArray(Traversable.scala:104)
>>>>>>>>        at 
>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$batchInsert$1(PhoenixDao.scala:45)
>>>>>>>>        at scala.util.Try$.apply(Try.scala:209)
>>>>>>>>        at 
>>>>>>>> x.persistence.phoenix.PhoenixDao.batchInsert(PhoenixDao.scala:45)
>>>>>>>>        at 
>>>>>>>> x.persistence.phoenix.PhoenixDao.$anonfun$insert$2(PhoenixDao.scala:35)
>>>>>>>>        at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:655)
>>>>>>>>        at scala.util.Success.$anonfun$map$1(Try.scala:251)
>>>>>>>>        at scala.util.Success.map(Try.scala:209)
>>>>>>>>        at scala.concurrent.Future.$anonfun$map$1(Future.scala:289)
>>>>>>>>        at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:29)
>>>>>>>>        at 
>>>>>>>> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:29)
>>>>>>>>        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
>>>>>>>>        at 
>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>>>>>        at 
>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>>>>>        at java.lang.Thread.run(Thread.java:748)
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: Table dead lock: ERROR 1120 (XCL20): Writes to table blocked until index can be updated

Reply via email to