Re: Table dead lock: ERROR 1120 (XCL20): Writes to table blocked until index can be updated

2018-09-26 Thread Vincent Poon
We are planning a Phoenix 4.14.1 release which will have this fix

On Wed, Sep 26, 2018 at 3:36 PM Batyrshin Alexander <0x62...@gmail.com>
wrote:

> Thank you. We will try somehow...
> Is there any chance that this fix will be included in next release for
> HBASE-1.4 (not 2.0)?
>
> On 27 Sep 2018, at 01:04, Ankit Singhal  wrote:
>
> You might be hitting PHOENIX-4785
> ,  you can apply the
> patch on top of 4.14 and see if it fixes your problem.
>
> Regards,
> Ankit Singhal
>
> On Wed, Sep 26, 2018 at 2:33 PM Batyrshin Alexander <0x62...@gmail.com>
> wrote:
>
>> Any advices? Helps?
>> I can reproduce problem and capture more logs if needed.
>>
>> On 21 Sep 2018, at 02:13, Batyrshin Alexander <0x62...@gmail.com> wrote:
>>
>> Looks like lock goes away 30 minutes after index region split.
>> So i can assume that this issue comes from cache that configured by this
>> option:* phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs*
>>
>>
>>
>> On 21 Sep 2018, at 00:15, Batyrshin Alexander <0x62...@gmail.com> wrote:
>>
>> And how this split looks at Master logs:
>>
>> Sep 20 19:45:04 prod001 hbase[10838]: 2018-09-20 19:45:04,888 INFO
>>  [AM.ZK.Worker-pool5-t282] master.RegionStates: Transition
>> {3e44b85ddf407da831dbb9a871496986 state=OPEN,
>> ts=1537304859509, server=prod013,60020,1537304282885} to
>> {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, ts=1537461904888,
>> server=prod
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO
>>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
>> {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, ts=1537461905340,
>> server=prod013,60020,1537304282885} to {3e44b85ddf407da831dbb9a871496986
>> state=SPLIT, ts=1537461905340, server=pro
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO
>>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Offlined
>> 3e44b85ddf407da831dbb9a871496986 from prod013,60020,1537304282885
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO
>>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
>> {33cba925c7acb347ac3f5e70e839c3cb state=SPLITTING_NEW, ts=1537461905340,
>> server=prod013,60020,1537304282885} to {33cba925c7acb347ac3f5e70e839c3cb
>> state=OPEN, ts=1537461905341, server=
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO
>>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
>> {acb8f16a004a894c8706f6e12cd26144 state=SPLITTING_NEW, ts=1537461905340,
>> server=prod013,60020,1537304282885} to {acb8f16a004a894c8706f6e12cd26144
>> state=OPEN, ts=1537461905341, server=
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,343 INFO
>>  [AM.ZK.Worker-pool5-t284] master.AssignmentManager: Handled SPLIT
>> event; 
>> parent=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
>> daughter a=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1
>> Sep 20 19:47:41 prod001 hbase[10838]: 2018-09-20 19:47:41,972 INFO
>>  [prod001,6,1537304851459_ChoreService_2]
>> balancer.StochasticLoadBalancer: Skipping load balancing because balanced
>> cluster; total cost is 17.82282205608522, sum multiplier is 1102.0 min cost
>> which need balance is 0.05
>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,021 INFO
>>  [prod001,6,1537304851459_ChoreService_1] hbase.MetaTableAccessor:
>> Deleted 
>> IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,022 INFO
>>  [prod001,6,1537304851459_ChoreService_1] master.CatalogJanitor:
>> Scanned 779 catalog row(s), gc'd 0 unreferenced merged region(s) and 1
>> unreferenced parent region(s)
>>
>> On 20 Sep 2018, at 21:43, Batyrshin Alexander <0x62...@gmail.com> wrote:
>>
>> Looks like problem was because of index region split
>>
>> Index region split at prod013:
>> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO
>>  [regionserver/prod013/10.0.0.13:60020-splits-1537400010677]
>> regionserver.SplitRequest: Region split, hbase:meta updated, and report to
>> master.
>> Parent=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
>> new
>> regions: 
>> IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1537461904877.33cba925c7acb347ac3f5e70e839c3cb.,
>>  
>> IDX_MARK_O,\x107834005168\x46200068=4YF!YI,1537461904877.acb8f16a004a894c8706f6e12cd26144..
>> Split took 0sec
>> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO
>>  [regionserver/prod013/10.0.0.13:60020-splits-1537400010677]
>> regionserver.SplitRequest: Split transaction journal:
>> Sep 20 19:45:05 prod013 hbase[193055]: STARTED at 1537461904853
>> Sep 20 19:45:05 prod013 hbase[193055]: PREPARED at 1537461904877
>> Sep 20 19:45:05 prod013 hbase[193055]: BEFORE_PRE_SPLIT_HOOK at
>> 1537461904877
>> Sep 20 19:45:05 prod013 hbase[193055]:   

Re: Table dead lock: ERROR 1120 (XCL20): Writes to table blocked until index can be updated

2018-09-26 Thread Batyrshin Alexander
Thank you. We will try somehow...
Is there any chance that this fix will be included in next release for 
HBASE-1.4 (not 2.0)?

> On 27 Sep 2018, at 01:04, Ankit Singhal  wrote:
> 
> You might be hitting PHOENIX-4785 
> ,  you can apply the patch 
> on top of 4.14 and see if it fixes your problem.
> 
> Regards,
> Ankit Singhal
> 
> On Wed, Sep 26, 2018 at 2:33 PM Batyrshin Alexander <0x62...@gmail.com 
> > wrote:
> Any advices? Helps?
> I can reproduce problem and capture more logs if needed.
> 
>> On 21 Sep 2018, at 02:13, Batyrshin Alexander <0x62...@gmail.com 
>> > wrote:
>> 
>> Looks like lock goes away 30 minutes after index region split.
>> So i can assume that this issue comes from cache that configured by this 
>> option: phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs
>> 
>> 
>> 
>>> On 21 Sep 2018, at 00:15, Batyrshin Alexander <0x62...@gmail.com 
>>> > wrote:
>>> 
>>> And how this split looks at Master logs:
>>> 
>>> Sep 20 19:45:04 prod001 hbase[10838]: 2018-09-20 19:45:04,888 INFO  
>>> [AM.ZK.Worker-pool5-t282] master.RegionStates: Transition 
>>> {3e44b85ddf407da831dbb9a871496986 state=OPEN, ts=1537304859509, 
>>> server=prod013,60020,1537304282885} to {3e44b85ddf407da831dbb9a871496986 
>>> state=SPLITTING, ts=1537461904888, server=prod
>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO  
>>> [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition 
>>> {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, ts=1537461905340, 
>>> server=prod013,60020,1537304282885} to {3e44b85ddf407da831dbb9a871496986 
>>> state=SPLIT, ts=1537461905340, server=pro
>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO  
>>> [AM.ZK.Worker-pool5-t284] master.RegionStates: Offlined 
>>> 3e44b85ddf407da831dbb9a871496986 from prod013,60020,1537304282885
>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO  
>>> [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition 
>>> {33cba925c7acb347ac3f5e70e839c3cb state=SPLITTING_NEW, ts=1537461905340, 
>>> server=prod013,60020,1537304282885} to {33cba925c7acb347ac3f5e70e839c3cb 
>>> state=OPEN, ts=1537461905341, server=
>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO  
>>> [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition 
>>> {acb8f16a004a894c8706f6e12cd26144 state=SPLITTING_NEW, ts=1537461905340, 
>>> server=prod013,60020,1537304282885} to {acb8f16a004a894c8706f6e12cd26144 
>>> state=OPEN, ts=1537461905341, server=
>>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,343 INFO  
>>> [AM.ZK.Worker-pool5-t284] master.AssignmentManager: Handled SPLIT event; 
>>> parent=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
>>>  daughter a=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1
>>> Sep 20 19:47:41 prod001 hbase[10838]: 2018-09-20 19:47:41,972 INFO  
>>> [prod001,6,1537304851459_ChoreService_2] 
>>> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
>>> cluster; total cost is 17.82282205608522, sum multiplier is 1102.0 min cost 
>>> which need balance is 0.05
>>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,021 INFO  
>>> [prod001,6,1537304851459_ChoreService_1] hbase.MetaTableAccessor: 
>>> Deleted 
>>> IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
>>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,022 INFO  
>>> [prod001,6,1537304851459_ChoreService_1] master.CatalogJanitor: Scanned 
>>> 779 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 unreferenced 
>>> parent region(s)
>>> 
 On 20 Sep 2018, at 21:43, Batyrshin Alexander <0x62...@gmail.com 
 > wrote:
 
 Looks like problem was because of index region split
 
 Index region split at prod013:
 Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO  
 [regionserver/prod013/10.0.0.13:60020-splits-1537400010677] 
 regionserver.SplitRequest: Region split, hbase:meta updated, and report to 
 master. 
 Parent=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
  new regions: 
 IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1537461904877.33cba925c7acb347ac3f5e70e839c3cb.,
  
 IDX_MARK_O,\x107834005168\x46200068=4YF!YI,1537461904877.acb8f16a004a894c8706f6e12cd26144..
  Split took 0sec
 Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO  
 [regionserver/prod013/10.0.0.13:60020-splits-1537400010677] 
 regionserver.SplitRequest: Split transaction journal:
 Sep 20 19:45:05 prod013 hbase[193055]: STARTED at 1537461904853
 Sep 20 19:45:05 prod013 hbase[193055]: PREPARED at 1537461904877
 Sep 20 19:45:05 prod013 

Re: Table dead lock: ERROR 1120 (XCL20): Writes to table blocked until index can be updated

2018-09-26 Thread Ankit Singhal
You might be hitting PHOENIX-4785
,  you can apply the
patch on top of 4.14 and see if it fixes your problem.

Regards,
Ankit Singhal

On Wed, Sep 26, 2018 at 2:33 PM Batyrshin Alexander <0x62...@gmail.com>
wrote:

> Any advices? Helps?
> I can reproduce problem and capture more logs if needed.
>
> On 21 Sep 2018, at 02:13, Batyrshin Alexander <0x62...@gmail.com> wrote:
>
> Looks like lock goes away 30 minutes after index region split.
> So i can assume that this issue comes from cache that configured by this
> option:* phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs*
>
>
>
> On 21 Sep 2018, at 00:15, Batyrshin Alexander <0x62...@gmail.com> wrote:
>
> And how this split looks at Master logs:
>
> Sep 20 19:45:04 prod001 hbase[10838]: 2018-09-20 19:45:04,888 INFO
>  [AM.ZK.Worker-pool5-t282] master.RegionStates: Transition
> {3e44b85ddf407da831dbb9a871496986 state=OPEN,
> ts=1537304859509, server=prod013,60020,1537304282885} to
> {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, ts=1537461904888,
> server=prod
> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO
>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
> {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, ts=1537461905340,
> server=prod013,60020,1537304282885} to {3e44b85ddf407da831dbb9a871496986
> state=SPLIT, ts=1537461905340, server=pro
> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO
>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Offlined
> 3e44b85ddf407da831dbb9a871496986 from prod013,60020,1537304282885
> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO
>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
> {33cba925c7acb347ac3f5e70e839c3cb state=SPLITTING_NEW, ts=1537461905340,
> server=prod013,60020,1537304282885} to {33cba925c7acb347ac3f5e70e839c3cb
> state=OPEN, ts=1537461905341, server=
> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO
>  [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition
> {acb8f16a004a894c8706f6e12cd26144 state=SPLITTING_NEW, ts=1537461905340,
> server=prod013,60020,1537304282885} to {acb8f16a004a894c8706f6e12cd26144
> state=OPEN, ts=1537461905341, server=
> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,343 INFO
>  [AM.ZK.Worker-pool5-t284] master.AssignmentManager: Handled SPLIT
> event; 
> parent=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
> daughter a=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1
> Sep 20 19:47:41 prod001 hbase[10838]: 2018-09-20 19:47:41,972 INFO
>  [prod001,6,1537304851459_ChoreService_2]
> balancer.StochasticLoadBalancer: Skipping load balancing because balanced
> cluster; total cost is 17.82282205608522, sum multiplier is 1102.0 min cost
> which need balance is 0.05
> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,021 INFO
>  [prod001,6,1537304851459_ChoreService_1] hbase.MetaTableAccessor:
> Deleted 
> IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,022 INFO
>  [prod001,6,1537304851459_ChoreService_1] master.CatalogJanitor:
> Scanned 779 catalog row(s), gc'd 0 unreferenced merged region(s) and 1
> unreferenced parent region(s)
>
> On 20 Sep 2018, at 21:43, Batyrshin Alexander <0x62...@gmail.com> wrote:
>
> Looks like problem was because of index region split
>
> Index region split at prod013:
> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO
>  [regionserver/prod013/10.0.0.13:60020-splits-1537400010677]
> regionserver.SplitRequest: Region split, hbase:meta updated, and report to
> master.
> Parent=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
> new
> regions: 
> IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1537461904877.33cba925c7acb347ac3f5e70e839c3cb.,
>  
> IDX_MARK_O,\x107834005168\x46200068=4YF!YI,1537461904877.acb8f16a004a894c8706f6e12cd26144..
> Split took 0sec
> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO
>  [regionserver/prod013/10.0.0.13:60020-splits-1537400010677]
> regionserver.SplitRequest: Split transaction journal:
> Sep 20 19:45:05 prod013 hbase[193055]: STARTED at 1537461904853
> Sep 20 19:45:05 prod013 hbase[193055]: PREPARED at 1537461904877
> Sep 20 19:45:05 prod013 hbase[193055]: BEFORE_PRE_SPLIT_HOOK at
> 1537461904877
> Sep 20 19:45:05 prod013 hbase[193055]: AFTER_PRE_SPLIT_HOOK at
> 1537461904877
> Sep 20 19:45:05 prod013 hbase[193055]: SET_SPLITTING at
> 1537461904880
> Sep 20 19:45:05 prod013 hbase[193055]: CREATE_SPLIT_DIR at
> 1537461904987
> Sep 20 19:45:05 prod013 hbase[193055]: CLOSED_PARENT_REGION at
> 1537461905002
> Sep 20 19:45:05 prod013 hbase[193055]: OFFLINED_PARENT at
> 1537461905002
> Sep 20 19:45:05 prod013 hbase[193055]:  

Re: Table dead lock: ERROR 1120 (XCL20): Writes to table blocked until index can be updated

2018-09-26 Thread Batyrshin Alexander
Any advices? Helps?
I can reproduce problem and capture more logs if needed.

> On 21 Sep 2018, at 02:13, Batyrshin Alexander <0x62...@gmail.com> wrote:
> 
> Looks like lock goes away 30 minutes after index region split.
> So i can assume that this issue comes from cache that configured by this 
> option: phoenix.coprocessor.maxMetaDataCacheTimeToLiveMs
> 
> 
> 
>> On 21 Sep 2018, at 00:15, Batyrshin Alexander <0x62...@gmail.com 
>> > wrote:
>> 
>> And how this split looks at Master logs:
>> 
>> Sep 20 19:45:04 prod001 hbase[10838]: 2018-09-20 19:45:04,888 INFO  
>> [AM.ZK.Worker-pool5-t282] master.RegionStates: Transition 
>> {3e44b85ddf407da831dbb9a871496986 state=OPEN, ts=1537304859509, 
>> server=prod013,60020,1537304282885} to {3e44b85ddf407da831dbb9a871496986 
>> state=SPLITTING, ts=1537461904888, server=prod
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO  
>> [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition 
>> {3e44b85ddf407da831dbb9a871496986 state=SPLITTING, ts=1537461905340, 
>> server=prod013,60020,1537304282885} to {3e44b85ddf407da831dbb9a871496986 
>> state=SPLIT, ts=1537461905340, server=pro
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,340 INFO  
>> [AM.ZK.Worker-pool5-t284] master.RegionStates: Offlined 
>> 3e44b85ddf407da831dbb9a871496986 from prod013,60020,1537304282885
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO  
>> [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition 
>> {33cba925c7acb347ac3f5e70e839c3cb state=SPLITTING_NEW, ts=1537461905340, 
>> server=prod013,60020,1537304282885} to {33cba925c7acb347ac3f5e70e839c3cb 
>> state=OPEN, ts=1537461905341, server=
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,341 INFO  
>> [AM.ZK.Worker-pool5-t284] master.RegionStates: Transition 
>> {acb8f16a004a894c8706f6e12cd26144 state=SPLITTING_NEW, ts=1537461905340, 
>> server=prod013,60020,1537304282885} to {acb8f16a004a894c8706f6e12cd26144 
>> state=OPEN, ts=1537461905341, server=
>> Sep 20 19:45:05 prod001 hbase[10838]: 2018-09-20 19:45:05,343 INFO  
>> [AM.ZK.Worker-pool5-t284] master.AssignmentManager: Handled SPLIT event; 
>> parent=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
>>  daughter a=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1
>> Sep 20 19:47:41 prod001 hbase[10838]: 2018-09-20 19:47:41,972 INFO  
>> [prod001,6,1537304851459_ChoreService_2] 
>> balancer.StochasticLoadBalancer: Skipping load balancing because balanced 
>> cluster; total cost is 17.82282205608522, sum multiplier is 1102.0 min cost 
>> which need balance is 0.05
>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,021 INFO  
>> [prod001,6,1537304851459_ChoreService_1] hbase.MetaTableAccessor: 
>> Deleted 
>> IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.
>> Sep 20 19:47:42 prod001 hbase[10838]: 2018-09-20 19:47:42,022 INFO  
>> [prod001,6,1537304851459_ChoreService_1] master.CatalogJanitor: Scanned 
>> 779 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 unreferenced 
>> parent region(s)
>> 
>>> On 20 Sep 2018, at 21:43, Batyrshin Alexander <0x62...@gmail.com 
>>> > wrote:
>>> 
>>> Looks like problem was because of index region split
>>> 
>>> Index region split at prod013:
>>> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO  
>>> [regionserver/prod013/10.0.0.13:60020-splits-1537400010677] 
>>> regionserver.SplitRequest: Region split, hbase:meta updated, and report to 
>>> master. 
>>> Parent=IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1536637905252.3e44b85ddf407da831dbb9a871496986.,
>>>  new regions: 
>>> IDX_MARK_O,\x107834005168\x46200020LWfBS4c,1537461904877.33cba925c7acb347ac3f5e70e839c3cb.,
>>>  
>>> IDX_MARK_O,\x107834005168\x46200068=4YF!YI,1537461904877.acb8f16a004a894c8706f6e12cd26144..
>>>  Split took 0sec
>>> Sep 20 19:45:05 prod013 hbase[193055]: 2018-09-20 19:45:05,441 INFO  
>>> [regionserver/prod013/10.0.0.13:60020-splits-1537400010677] 
>>> regionserver.SplitRequest: Split transaction journal:
>>> Sep 20 19:45:05 prod013 hbase[193055]: STARTED at 1537461904853
>>> Sep 20 19:45:05 prod013 hbase[193055]: PREPARED at 1537461904877
>>> Sep 20 19:45:05 prod013 hbase[193055]: BEFORE_PRE_SPLIT_HOOK at 
>>> 1537461904877
>>> Sep 20 19:45:05 prod013 hbase[193055]: AFTER_PRE_SPLIT_HOOK at 
>>> 1537461904877
>>> Sep 20 19:45:05 prod013 hbase[193055]: SET_SPLITTING at 
>>> 1537461904880
>>> Sep 20 19:45:05 prod013 hbase[193055]: CREATE_SPLIT_DIR at 
>>> 1537461904987
>>> Sep 20 19:45:05 prod013 hbase[193055]: CLOSED_PARENT_REGION at 
>>> 1537461905002
>>> Sep 20 19:45:05 prod013 hbase[193055]: OFFLINED_PARENT at 
>>> 1537461905002
>>> Sep 20 19:45:05 prod013 hbase[193055]: STARTED_REGION_A_CREATION at 
>>> 1537461905056

Re: org.apache.phoenix.shaded.org.apache.thrift.TException: Unable to discover transaction service. -> TException: Unable to discover transaction service.

2018-09-26 Thread Josh Elser

If you're using HBase with Hadoop3, HBase should have Hadoop3 jars.

Re-build HBase using the -Dhadoop.profile=3.0 (I think it is) CLI option.

On 9/26/18 7:21 AM, Francis Chuang wrote:
Upon further investigation, it appears that this is because 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab 
is only available in Hadoop 2.8+. HBase ships with Hadoop 2.7.4 jars.


I noticed that Hadoop was bumped from 2.7.4 to 3.0.0 a few months ago to 
fix PQS/Avatica issues: 
https://github.com/apache/phoenix/blame/master/pom.xml#L70


I think this causes Phoenix to expect some things that are available in 
Hadoop 3.0.0, but are not present in HBase's Hadoop 2.7.4 jars.


I think I can try and replace the hadoop-*.jar files in hbase/lib with 
the equivalent 2.8.5 versions, however I am not familiar with Java and 
the hadoop project, so I am not sure if this is going to introduce issues.


On 26/09/2018 4:44 PM, Francis Chuang wrote:

I wonder if this is because:
- HBase's binary distribution ships with Hadoop 2.7.4 jars.
- Phoenix 5.0.0 has Hadoop 3.0.0 declared in its pom.xml: 
https://github.com/apache/phoenix/blob/8a819c6c3b4befce190c6ac759f744df511de61d/pom.xml#L70 

- Tephra has Hadoop 2.2.0 declared in its pom.xml: 
https://github.com/apache/incubator-tephra/blob/master/pom.xml#L211


On 26/09/2018 4:03 PM, Francis Chuang wrote:

Hi all,

I am using Phoenix 5.0.0 with HBase 2.0.0. I am seeing errors while 
trying to create transactional tables using Phoenix.


I am using my Phoenix + HBase all in one docker image available here: 
https://github.com/Boostport/hbase-phoenix-all-in-one


This is the error: 
org.apache.phoenix.shaded.org.apache.thrift.TException: Unable to 
discover transaction service. -> TException: Unable to discover 
transaction service.


I checked the tephra logs and got the following:

Exception in thread "HDFSTransactionStateStorage STARTING" Exception 
in thread "ThriftRPCServer" 
com.google.common.util.concurrent.ExecutionError: 
java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z 

    at 
com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1008) 

    at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001) 

    at 
com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) 

    at 
com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106) 

    at 
org.apache.tephra.TransactionManager.doStart(TransactionManager.java:245) 

    at 
com.google.common.util.concurrent.AbstractService.start(AbstractService.java:170) 

    at 
com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220) 

    at 
org.apache.tephra.distributed.TransactionServiceThriftHandler.init(TransactionServiceThriftHandler.java:249) 

    at 
org.apache.tephra.rpc.ThriftRPCServer.startUp(ThriftRPCServer.java:177)
    at 
com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47) 


    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z 

    at 
org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:715) 

    at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:925) 

    at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:873) 

    at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:740) 

    at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3472)
    at 
org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3310)

    at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:529)
    at 
org.apache.tephra.persist.HDFSTransactionStateStorage.startUp(HDFSTransactionStateStorage.java:104) 

    at 
com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43) 


    ... 1 more
2018-09-26 04:31:11,290 INFO [leader-election-tx.service-leader] 
distributed.TransactionService (TransactionService.java:leader(115)) 
- Transaction Thrift Service didn't start on /0.0.0.0:15165
java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z 

    at 
org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:715) 

    at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:925) 

    at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:873) 

    at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:740) 

    at 

Re: org.apache.phoenix.shaded.org.apache.thrift.TException: Unable to discover transaction service. -> TException: Unable to discover transaction service.

2018-09-26 Thread Francis Chuang
Upon further investigation, it appears that this is because 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab 
is only available in Hadoop 2.8+. HBase ships with Hadoop 2.7.4 jars.


I noticed that Hadoop was bumped from 2.7.4 to 3.0.0 a few months ago to 
fix PQS/Avatica issues: 
https://github.com/apache/phoenix/blame/master/pom.xml#L70


I think this causes Phoenix to expect some things that are available in 
Hadoop 3.0.0, but are not present in HBase's Hadoop 2.7.4 jars.


I think I can try and replace the hadoop-*.jar files in hbase/lib with 
the equivalent 2.8.5 versions, however I am not familiar with Java and 
the hadoop project, so I am not sure if this is going to introduce issues.


On 26/09/2018 4:44 PM, Francis Chuang wrote:

I wonder if this is because:
- HBase's binary distribution ships with Hadoop 2.7.4 jars.
- Phoenix 5.0.0 has Hadoop 3.0.0 declared in its pom.xml: 
https://github.com/apache/phoenix/blob/8a819c6c3b4befce190c6ac759f744df511de61d/pom.xml#L70
- Tephra has Hadoop 2.2.0 declared in its pom.xml: 
https://github.com/apache/incubator-tephra/blob/master/pom.xml#L211


On 26/09/2018 4:03 PM, Francis Chuang wrote:

Hi all,

I am using Phoenix 5.0.0 with HBase 2.0.0. I am seeing errors while 
trying to create transactional tables using Phoenix.


I am using my Phoenix + HBase all in one docker image available here: 
https://github.com/Boostport/hbase-phoenix-all-in-one


This is the error: 
org.apache.phoenix.shaded.org.apache.thrift.TException: Unable to 
discover transaction service. -> TException: Unable to discover 
transaction service.


I checked the tephra logs and got the following:

Exception in thread "HDFSTransactionStateStorage STARTING" Exception 
in thread "ThriftRPCServer" 
com.google.common.util.concurrent.ExecutionError: 
java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z
    at 
com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1008)
    at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001)
    at 
com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220)
    at 
com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106)
    at 
org.apache.tephra.TransactionManager.doStart(TransactionManager.java:245)
    at 
com.google.common.util.concurrent.AbstractService.start(AbstractService.java:170)
    at 
com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220)
    at 
org.apache.tephra.distributed.TransactionServiceThriftHandler.init(TransactionServiceThriftHandler.java:249)
    at 
org.apache.tephra.rpc.ThriftRPCServer.startUp(ThriftRPCServer.java:177)
    at 
com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47)

    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z
    at 
org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:715)
    at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:925)
    at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:873)
    at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:740)
    at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3472)
    at 
org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3310)

    at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:529)
    at 
org.apache.tephra.persist.HDFSTransactionStateStorage.startUp(HDFSTransactionStateStorage.java:104)
    at 
com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43)

    ... 1 more
2018-09-26 04:31:11,290 INFO [leader-election-tx.service-leader] 
distributed.TransactionService (TransactionService.java:leader(115)) 
- Transaction Thrift Service didn't start on /0.0.0.0:15165
java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z
    at 
org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:715)
    at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:925)
    at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:873)
    at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:740)
    at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3472)
    at 
org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3310)

    at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:529)
    at 

Re: org.apache.phoenix.shaded.org.apache.thrift.TException: Unable to discover transaction service. -> TException: Unable to discover transaction service.

2018-09-26 Thread Francis Chuang

I wonder if this is because:
- HBase's binary distribution ships with Hadoop 2.7.4 jars.
- Phoenix 5.0.0 has Hadoop 3.0.0 declared in its pom.xml: 
https://github.com/apache/phoenix/blob/8a819c6c3b4befce190c6ac759f744df511de61d/pom.xml#L70
- Tephra has Hadoop 2.2.0 declared in its pom.xml: 
https://github.com/apache/incubator-tephra/blob/master/pom.xml#L211


On 26/09/2018 4:03 PM, Francis Chuang wrote:

Hi all,

I am using Phoenix 5.0.0 with HBase 2.0.0. I am seeing errors while 
trying to create transactional tables using Phoenix.


I am using my Phoenix + HBase all in one docker image available here: 
https://github.com/Boostport/hbase-phoenix-all-in-one


This is the error: 
org.apache.phoenix.shaded.org.apache.thrift.TException: Unable to 
discover transaction service. -> TException: Unable to discover 
transaction service.


I checked the tephra logs and got the following:

Exception in thread "HDFSTransactionStateStorage STARTING" Exception 
in thread "ThriftRPCServer" 
com.google.common.util.concurrent.ExecutionError: 
java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z
    at 
com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1008)
    at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001)
    at 
com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220)
    at 
com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106)
    at 
org.apache.tephra.TransactionManager.doStart(TransactionManager.java:245)
    at 
com.google.common.util.concurrent.AbstractService.start(AbstractService.java:170)
    at 
com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220)
    at 
org.apache.tephra.distributed.TransactionServiceThriftHandler.init(TransactionServiceThriftHandler.java:249)
    at 
org.apache.tephra.rpc.ThriftRPCServer.startUp(ThriftRPCServer.java:177)
    at 
com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47)

    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z
    at 
org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:715)
    at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:925)
    at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:873)
    at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:740)
    at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3472)
    at 
org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3310)

    at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:529)
    at 
org.apache.tephra.persist.HDFSTransactionStateStorage.startUp(HDFSTransactionStateStorage.java:104)
    at 
com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43)

    ... 1 more
2018-09-26 04:31:11,290 INFO  [leader-election-tx.service-leader] 
distributed.TransactionService (TransactionService.java:leader(115)) - 
Transaction Thrift Service didn't start on /0.0.0.0:15165
java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z
    at 
org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:715)
    at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:925)
    at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:873)
    at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:740)
    at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3472)
    at 
org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3310)

    at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:529)
    at 
org.apache.tephra.persist.HDFSTransactionStateStorage.startUp(HDFSTransactionStateStorage.java:104)
    at 
com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43)

    at java.lang.Thread.run(Thread.java:748)

I know that HBase ships with the Hadoop 2.7.4 jars and I was not able 
to find "hasKerberosKeyTab" grepping through the source code for 
hadoop 2.7.4. However, I checked the Hadoop 2.7.4 source files from 
the stack trace above and the line numbers do not match up.


Interestingly, I only see this issue on my older machine (Core i7 920 
with 12GB of RAM) and Gitlab's CI environment (a Google Cloud 
n1-standard-1 instance with 1vCPU and 3.75GB of RAM). I know Michael 
also encountered this problem while running the Phoenix tests for 
calcite-avatica-go on an older i5 machine from 2011.


It 

org.apache.phoenix.shaded.org.apache.thrift.TException: Unable to discover transaction service. -> TException: Unable to discover transaction service.

2018-09-26 Thread Francis Chuang

Hi all,

I am using Phoenix 5.0.0 with HBase 2.0.0. I am seeing errors while 
trying to create transactional tables using Phoenix.


I am using my Phoenix + HBase all in one docker image available here: 
https://github.com/Boostport/hbase-phoenix-all-in-one


This is the error: 
org.apache.phoenix.shaded.org.apache.thrift.TException: Unable to 
discover transaction service. -> TException: Unable to discover 
transaction service.


I checked the tephra logs and got the following:

Exception in thread "HDFSTransactionStateStorage STARTING" Exception in 
thread "ThriftRPCServer" 
com.google.common.util.concurrent.ExecutionError: 
java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z
    at 
com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1008)
    at 
com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001)
    at 
com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220)
    at 
com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106)
    at 
org.apache.tephra.TransactionManager.doStart(TransactionManager.java:245)
    at 
com.google.common.util.concurrent.AbstractService.start(AbstractService.java:170)
    at 
com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220)
    at 
org.apache.tephra.distributed.TransactionServiceThriftHandler.init(TransactionServiceThriftHandler.java:249)
    at 
org.apache.tephra.rpc.ThriftRPCServer.startUp(ThriftRPCServer.java:177)
    at 
com.google.common.util.concurrent.AbstractExecutionThreadService$1$1.run(AbstractExecutionThreadService.java:47)

    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z
    at 
org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:715)
    at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:925)
    at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:873)
    at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:740)
    at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3472)
    at 
org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3310)

    at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:529)
    at 
org.apache.tephra.persist.HDFSTransactionStateStorage.startUp(HDFSTransactionStateStorage.java:104)
    at 
com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43)

    ... 1 more
2018-09-26 04:31:11,290 INFO  [leader-election-tx.service-leader] 
distributed.TransactionService (TransactionService.java:leader(115)) - 
Transaction Thrift Service didn't start on /0.0.0.0:15165
java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.util.KerberosUtil.hasKerberosKeyTab(Ljavax/security/auth/Subject;)Z
    at 
org.apache.hadoop.security.UserGroupInformation.(UserGroupInformation.java:715)
    at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:925)
    at 
org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:873)
    at 
org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:740)
    at 
org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:3472)
    at 
org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:3310)

    at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:529)
    at 
org.apache.tephra.persist.HDFSTransactionStateStorage.startUp(HDFSTransactionStateStorage.java:104)
    at 
com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43)

    at java.lang.Thread.run(Thread.java:748)

I know that HBase ships with the Hadoop 2.7.4 jars and I was not able to 
find "hasKerberosKeyTab" grepping through the source code for hadoop 
2.7.4. However, I checked the Hadoop 2.7.4 source files from the stack 
trace above and the line numbers do not match up.


Interestingly, I only see this issue on my older machine (Core i7 920 
with 12GB of RAM) and Gitlab's CI environment (a Google Cloud 
n1-standard-1 instance with 1vCPU and 3.75GB of RAM). I know Michael 
also encountered this problem while running the Phoenix tests for 
calcite-avatica-go on an older i5 machine from 2011.


It does seem to be pretty weird that we are only seeing this on machines 
where the CPU is not very powerful.


I also printed the classpath for tephra by doing:

$ # export HBASE_CONF_DIR=/opt/hbase/conf
$ # export HBASE_CP=/opt/hbase/lib
$ # export HBASE_HOME=/opt/hbase
$ # /opt/hbase/bin/tephra classpath