[jira] [Commented] (HBASE-28583) Upgrade from 2.5.9 to 3.0.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-09-15 Thread Ke Han (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881918#comment-17881918
 ] 

Ke Han commented on HBASE-28583:


Hi [Duo 
Zhang|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=zhangduo], 
May I ask whether it's necessary to set old_table_schema to required? It 
crashes the upgrade process as long as there's a protobuf message of 
RestoreSnapshotStateData from the old version data. 

I feel it should be set to optional for backward compatibility. If that's the 
case, I can provide a PR to fix it. Thank you!

> Upgrade from 2.5.9 to 3.0.0 crash with InvalidProtocolBufferException: 
> Message missing required fields: old_table_schema
> 
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.9
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-e19f64f2bc73.log
>
>
> When migrating data from 2.5.9 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-bet

[jira] [Comment Edited] (HBASE-28583) Upgrade from 2.5.9 to 3.0.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-09-15 Thread Ke Han (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881918#comment-17881918
 ] 

Ke Han edited comment on HBASE-28583 at 9/16/24 5:25 AM:
-

Hi [Duo 
Zhang|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=zhangduo], 
May I ask whether it's necessary to set old_table_schema to be {_}required{_}? 
It crashes the upgrade process as long as there's a protobuf message of 
RestoreSnapshotStateData from the old version data.

I feel it should be set to be _optional_ for backward compatibility. If that's 
the case, I can provide a PR to fix it. Thank you!


was (Author: JIRAUSER289562):
Hi [Duo 
Zhang|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=zhangduo], 
May I ask whether it's necessary to set old_table_schema to required? It 
crashes the upgrade process as long as there's a protobuf message of 
RestoreSnapshotStateData from the old version data. 

I feel it should be set to optional for backward compatibility. If that's the 
case, I can provide a PR to fix it. Thank you!

> Upgrade from 2.5.9 to 3.0.0 crash with InvalidProtocolBufferException: 
> Message missing required fields: old_table_schema
> 
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.9
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-e19f64f2bc73.log
>
>
> When migrating data from 2.5.9 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SN

[jira] [Updated] (HBASE-28187) NPE when flushing a non-existing column family

2024-09-13 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28187:
---
Description: 
Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
both shell and the HMaster logs.
h1. Reproduce

Start up HBase 2.5.9 cluster, executing the following commands with hbase shell 
in HMaster node will lead to NPE. (Can be reproduced determinstically)
{code:java}
create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER 
=> 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', BLOOMFILTER 
=> 'ROWCOL'}
incr 'table', 'row1', 'cf1:cell', 2
flush 'table', 'cf3'{code}
The shell outputs
{code:java}
hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
'NONE', BLOOMFILTER => 'ROWCOL'}
Created table table
Took 2.1238 seconds                                                             
                                                                    
=> Hbase::Table - table
hbase:007:0> 
hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
COUNTER VALUE = 2
Took 0.0131 seconds                                                             
                                                                    
hbase:009:0> 
hbase:010:0> flush 'table', 'cf3'
ERROR: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
Caused by: 
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
java.lang.NullPointerException
 at 
org.apache.hadoop.hbase.procedure.flush.RegionServerFlushTableProcedureManager$FlushTableSubprocedurePool.waitForOutstandingTasks(RegionServerFlushTableProcedureManager.java:274)
 at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.flushRegions(FlushTableSubprocedure.java:115)
 at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.acquireBarrier(FlushTableSubprocedure.java:126)
 at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:160)
 at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:750)
For usage try 'help "flush"'
Took 12.1713 seconds                                                         
{code}
 

According to the _flush (flush.rb)_ command specification, user can flush a 
specific column family.
{code:java}
Flush all regions in passed table or pass a region row to
flush an individual region or a region server name whose format
is 'host,port,startcode', to flush all its regions.
You can also flush a single column family for all regions within a table,
or for an specific region only.
For example:
  hbase> flush 'TABLENAME'
  hbase> flush 'TABLENAME','FAMILYNAME' {code}
In the above case, *cf3* an incorrect input (non-existing column family). If 
user tries to flush it, the expected output is:
 # HBase rejects this operation
 # returns a prompt saying the column family doesn't exist {_}"{_}{_}{+}ERROR: 
Unknown CF...{+}".{_}

 

In 2.6.0, the flush command would stuck and run into NPE
{code:java}
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.regionserver.HRegion.logFatLineOnFlush(HRegion.java:2724)
 ~[hbase-server-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2640)
 ~[hbase-server-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2587)
 ~[hbase-server-2.6.0.jar:2.6.0] {code}
h1. Root Cause

There's a missing check for the whether the target flushing columnfamily exists.

  was:
Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
both shell and the HMaster logs.
h1. Reproduce

Start up HBase 2.5.5 cluster, executing the following commands with hbase shell 
in HMaster node will lead to NPE. (Can be reproduced determinstically)
{code:java}
create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER 
=> 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', BLOOMFILTER 
=> 'ROWCOL'}
incr 'table', 'row1', 'cf1:cell', 2
flush 'table', 'cf3'{code}
The shell outputs
{code:java}
hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
'NONE', BLOOMFILTER => 'ROWCOL'}
Created table table
Took 2.1238 seconds                                  

[jira] [Updated] (HBASE-28187) NPE when flushing a non-existing column family

2024-09-13 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28187:
---
Affects Version/s: 2.6.0

> NPE when flushing a non-existing column family
> --
>
> Key: HBASE-28187
> URL: https://issues.apache.org/jira/browse/HBASE-28187
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 2.4.17, 2.5.5
>Reporter: Ke Han
>Priority: Major
>  Labels: pull-request-available
>
> Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
> both shell and the HMaster logs.
> h1. Reproduce
> Start up HBase 2.5.5 cluster, executing the following commands with hbase 
> shell in HMaster node will lead to NPE. (Can be reproduced determinstically)
> {code:java}
> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
> 'NONE', BLOOMFILTER => 'ROWCOL'}
> incr 'table', 'row1', 'cf1:cell', 2
> flush 'table', 'cf3'{code}
> The shell outputs
> {code:java}
> hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
> 'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
> 'NONE', BLOOMFILTER => 'ROWCOL'}
> Created table table
> Took 2.1238 seconds                                                           
>                                                                       
> => Hbase::Table - table
> hbase:007:0> 
> hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
> COUNTER VALUE = 2
> Took 0.0131 seconds                                                           
>                                                                       
> hbase:009:0> 
> hbase:010:0> flush 'table', 'cf3'
> ERROR: java.io.IOException: java.lang.NullPointerException
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
>  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>  at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
>  at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
> Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.hbase.procedure.flush.RegionServerFlushTableProcedureManager$FlushTableSubprocedurePool.waitForOutstandingTasks(RegionServerFlushTableProcedureManager.java:274)
>  at 
> org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.flushRegions(FlushTableSubprocedure.java:115)
>  at 
> org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.acquireBarrier(FlushTableSubprocedure.java:126)
>  at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:160)
>  at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:46)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:750)
> For usage try 'help "flush"'
> Took 12.1713 seconds                                                         
> {code}
>  
> According to the _flush (flush.rb)_ command specification, user can flush a 
> specific column family.
> {code:java}
> Flush all regions in passed table or pass a region row to
> flush an individual region or a region server name whose format
> is 'host,port,startcode', to flush all its regions.
> You can also flush a single column family for all regions within a table,
> or for an specific region only.
> For example:
>   hbase> flush 'TABLENAME'
>   hbase> flush 'TABLENAME','FAMILYNAME' {code}
> In the above case, *cf3* an incorrect input (non-existing column family). If 
> user tries to flush it, the expected output is:
>  # HBase rejects this operation
>  # returns a prompt saying the column family doesn't exist 
> {_}"{_}{_}{+}ERROR: Unknown CF...{+}".{_}
> h1. Root Cause
> There's a missing check for the whether the target flushing columnfamily 
> exists.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28812) Upgrade from 2.6.0 to 3.0.0 crashed

2024-09-09 Thread Ke Han (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880393#comment-17880393
 ] 

Ke Han commented on HBASE-28812:


[Duo 
Zhang|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=zhangduo] 
Thank you for the PR! I have applied the patch to a030e80998, and it's working.

> Upgrade from 2.6.0 to 3.0.0 crashed
> ---
>
> Key: HBASE-28812
> URL: https://issues.apache.org/jira/browse/HBASE-28812
> Project: HBase
>  Issue Type: Bug
>  Components: compatibility
>Affects Versions: 3.0.0
>Reporter: Ke Han
>Assignee: Duo Zhang
>Priority: Major
>  Labels: pull-request-available, upgrade
> Attachments: hbase--master-2d6e4fad2af5.log, 
> hbase--master-440ed844e077.log
>
>
> I am trying to upgrade from 2.6.0 (stable release) to 3.0.0. I built 3.0.0 
> using the following commit (a030e8099840e640684a68b6e4a79e7c1d5a6823)
> {code:java}
> commit a030e8099840e640684a68b6e4a79e7c1d5a6823 (HEAD -> branch-3, 
> upstream/branch-3)
> Author: Ray Mattingly 
> Date:   Mon Sep 2 04:38:29 2024 -0400    HBASE-28697 Don't clean bulk load 
> system entries until backup is complete (#6089)
>     
>     Co-authored-by: Ray Mattingly 
> {code}
> However, the HMaster would crash during the upgrade process.
> h1. Reproduce
> Step1: Start up 2.6.0 cluster (1 HDFS, 1 HM, 1 RS)
> Step2: Stop the entire cluster
> Step3: Upgrade to 3.0.0 cluster.
> HMaster will crash with the following error message
> {code:java}
> 2024-09-04T04:29:18,917 WARN  [master/hmaster:16000:becomeActiveMaster] 
> regionserver.HRegion: Failed initialize of region= 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682., starting to roll back 
> memstore
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
> Trailer from file 
> hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1215)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1158)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1030)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:974) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7794) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7749)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:277)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:432)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:135)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1003)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at java.lang.Thread.run(Thread.java:833) ~[?:?]
> Caused by: java.io.IOException: 
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
> Trailer from file 
> hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75
>         at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:289)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:339)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> 

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.9 to 3.0.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-09-05 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Attachment: (was: hbase--master-033a47be7d1d.log)

> Upgrade from 2.5.9 to 3.0.0 crash with InvalidProtocolBufferException: 
> Message missing required fields: old_table_schema
> 
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-e19f64f2bc73.log
>
>
> When migrating data from 2.5.9 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.9 to 3.0.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-09-05 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Attachment: hbase--master-e19f64f2bc73.log

> Upgrade from 2.5.9 to 3.0.0 crash with InvalidProtocolBufferException: 
> Message missing required fields: old_table_schema
> 
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-e19f64f2bc73.log
>
>
> When migrating data from 2.5.9 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.9 to 3.0.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-09-05 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Summary: Upgrade from 2.5.9 to 3.0.0 crash with 
InvalidProtocolBufferException: Message missing required fields: 
old_table_schema  (was: Upgrade from 2.5.8 to 3.0.0 crash with 
InvalidProtocolBufferException: Message missing required fields: 
old_table_schema)

> Upgrade from 2.5.9 to 3.0.0 crash with InvalidProtocolBufferException: 
> Message missing required fields: old_table_schema
> 
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-033a47be7d1d.log, persistent.tar.gz
>
>
> When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>  

[jira] [Updated] (HBASE-28815) Upgrade from 1.7.2 to 2.6.0 failed: HMaster aborted

2024-09-04 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28815:
---
Component/s: master

> Upgrade from 1.7.2 to 2.6.0 failed: HMaster aborted
> ---
>
> Key: HBASE-28815
> URL: https://issues.apache.org/jira/browse/HBASE-28815
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.6.0
>Reporter: Ke Han
>Priority: Major
>
> I am trying to migrate from 1.7.2 cluster to 2.6.0 (both are released 
> versions). However, I observed that the hmaster crashed during the upgrade 
> process.
> h1. Reproduce
> Step1: Start up 1.7.2 HBase cluster (1 HDFS, 1 HM, 1 RS).
> Step2: Stop the 1.7.2 HBase cluster.
> Step3: Upgrade to 2.6.0 HBase cluster.
> HMaster will crash with the following exception
> {code:java}
> 2024-09-04T16:04:47,004 WARN  [PEWorker-2] procedure.InitMetaProcedure: 
> Failed to init meta, suspend 1000secs
> java.io.IOException: Meta table is not partial, please sideline this meta 
> directory or run HBCK to fix this meta table, e.g. rebuild the server 
> hostname in ZNode for the meta region
>         at 
> org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.deleteMetaTableDirectoryIfPartial(InitMetaProcedure.java:199)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.writeFsLayout(InitMetaProcedure.java:78)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.executeFromState(InitMetaProcedure.java:102)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.executeFromState(InitMetaProcedure.java:54)
>  ~[hbase-server-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:944) 
> ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1766)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1444)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:77)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:2092)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216) 
> ~[hbase-common-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2119)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
> 2024-09-04T16:04:47,005 INFO  [PEWorker-2] procedure2.TimeoutExecutorThread: 
> ADDED pid=1, state=WAITING_TIMEOUT:INIT_META_WRITE_FS_LAYOUT, locked=true; 
> InitMetaProcedure table=hbase:meta; timeout=1000, timestamp=1725465888005
> 2024-09-04T16:04:48,045 ERROR [PEWorker-1] procedure2.ProcedureExecutor: Root 
> Procedure pid=1, state=FAILED:INIT_META_WRITE_FS_LAYOUT, 
> exception=org.apache.hadoop.hbase.exceptions.TimeoutIOException via 
> ProcedureExecutor:org.apache.hadoop.hbase.exceptions.TimeoutIOException: 
> Operation timed out after 1.0010 sec; InitMetaProcedure table=hbase:meta does 
> not support rollback but the execution failed and try to rollback, code bug?
> org.apache.hadoop.hbase.procedure2.RemoteProcedureException: 
> org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation timed out 
> after 1.0010 sec
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.setFailure(Procedure.java:768) 
> ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:797)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeTimedoutProcedure(TimeoutExecutorThread.java:131)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:109)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         at 
> org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
> Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation 
> timed out after 1.0010 sec
>         at 
> org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:798)
>  ~[hbase-procedure-2.6.0.jar:2.6.0]
>         ... 3 more
> 2024-09-04T16:04:48,058 INFO  [PEWorker-1] procedure2.Procedu

[jira] [Created] (HBASE-28815) Upgrade from 1.7.2 to 2.6.0 failed: HMaster aborted

2024-09-04 Thread Ke Han (Jira)
Ke Han created HBASE-28815:
--

 Summary: Upgrade from 1.7.2 to 2.6.0 failed: HMaster aborted
 Key: HBASE-28815
 URL: https://issues.apache.org/jira/browse/HBASE-28815
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Ke Han


I am trying to migrate from 1.7.2 cluster to 2.6.0 (both are released 
versions). However, I observed that the hmaster crashed during the upgrade 
process.
h1. Reproduce

Step1: Start up 1.7.2 HBase cluster (1 HDFS, 1 HM, 1 RS).

Step2: Stop the 1.7.2 HBase cluster.

Step3: Upgrade to 2.6.0 HBase cluster.

HMaster will crash with the following exception
{code:java}
2024-09-04T16:04:47,004 WARN  [PEWorker-2] procedure.InitMetaProcedure: Failed 
to init meta, suspend 1000secs
java.io.IOException: Meta table is not partial, please sideline this meta 
directory or run HBCK to fix this meta table, e.g. rebuild the server hostname 
in ZNode for the meta region
        at 
org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.deleteMetaTableDirectoryIfPartial(InitMetaProcedure.java:199)
 ~[hbase-server-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.writeFsLayout(InitMetaProcedure.java:78)
 ~[hbase-server-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.executeFromState(InitMetaProcedure.java:102)
 ~[hbase-server-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.master.procedure.InitMetaProcedure.executeFromState(InitMetaProcedure.java:54)
 ~[hbase-server-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:188)
 ~[hbase-procedure-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:944) 
~[hbase-procedure-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1766)
 ~[hbase-procedure-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1444)
 ~[hbase-procedure-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1000(ProcedureExecutor.java:77)
 ~[hbase-procedure-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.runProcedure(ProcedureExecutor.java:2092)
 ~[hbase-procedure-2.6.0.jar:2.6.0]
        at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216) 
~[hbase-common-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2119)
 ~[hbase-procedure-2.6.0.jar:2.6.0]
2024-09-04T16:04:47,005 INFO  [PEWorker-2] procedure2.TimeoutExecutorThread: 
ADDED pid=1, state=WAITING_TIMEOUT:INIT_META_WRITE_FS_LAYOUT, locked=true; 
InitMetaProcedure table=hbase:meta; timeout=1000, timestamp=1725465888005
2024-09-04T16:04:48,045 ERROR [PEWorker-1] procedure2.ProcedureExecutor: Root 
Procedure pid=1, state=FAILED:INIT_META_WRITE_FS_LAYOUT, 
exception=org.apache.hadoop.hbase.exceptions.TimeoutIOException via 
ProcedureExecutor:org.apache.hadoop.hbase.exceptions.TimeoutIOException: 
Operation timed out after 1.0010 sec; InitMetaProcedure table=hbase:meta does 
not support rollback but the execution failed and try to rollback, code bug?
org.apache.hadoop.hbase.procedure2.RemoteProcedureException: 
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation timed out 
after 1.0010 sec
        at 
org.apache.hadoop.hbase.procedure2.Procedure.setFailure(Procedure.java:768) 
~[hbase-procedure-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:797)
 ~[hbase-procedure-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.executeTimedoutProcedure(TimeoutExecutorThread.java:131)
 ~[hbase-procedure-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.execDelayedProcedure(TimeoutExecutorThread.java:109)
 ~[hbase-procedure-2.6.0.jar:2.6.0]
        at 
org.apache.hadoop.hbase.procedure2.TimeoutExecutorThread.run(TimeoutExecutorThread.java:68)
 ~[hbase-procedure-2.6.0.jar:2.6.0]
Caused by: org.apache.hadoop.hbase.exceptions.TimeoutIOException: Operation 
timed out after 1.0010 sec
        at 
org.apache.hadoop.hbase.procedure2.Procedure.setTimeoutFailure(Procedure.java:798)
 ~[hbase-procedure-2.6.0.jar:2.6.0]
        ... 3 more
2024-09-04T16:04:48,058 INFO  [PEWorker-1] procedure2.ProcedureExecutor: Rolled 
back pid=1, state=ROLLEDBACK, 
exception=org.apache.hadoop.hbase.exceptions.TimeoutIOException via 
ProcedureExecutor:org.apache.hadoop.hbase.exceptions.TimeoutIOException: 
Operation timed out after 1.0010 sec; InitMetaProcedure table=hbase:meta 
exec-time=1.4160 sec
2024-09-04T16:04:48,059 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
java.

[jira] [Comment Edited] (HBASE-28812) Upgrade from 2.6.0 to 3.0.0 crashed

2024-09-04 Thread Ke Han (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879317#comment-17879317
 ] 

Ke Han edited comment on HBASE-28812 at 9/4/24 5:44 PM:


[~zhangduo] Thank you for the reply! It seems yes. Upgrading from 2.6.0 to the 
commit before HBASE-28577 will succeed.

 

I tested upgrading from 2.6.0 to the following 2 commits from master branch
{code:java}
Upgrade crashed. (The error log looks similar. I attached the failure log: 
hbase--master-440ed844e077.log)

commit 419666b8eb8a881724fe6f65e8235a4220824e51 (HEAD)
Author: lixiaobao <977734...@qq.com>
Date:   Wed May 22 18:34:42 2024 +0800    HBASE-28577 Remove deprecated methods 
in KeyValue (#5883)
    
    Co-authored-by: lixiaobao 
    Co-authored-by: 李小保 
    Signed-off-by: Duo Zhang 


===

Upgrade succeeded.

commit 3b18ba664a6dcde344e13fe9305c272592195c03
Author: Nick Dimiduk 
Date:   Wed May 22 10:05:54 2024 +0200    HBASE-28605 Add ErrorProne ban on 
Hadoop shaded thirdparty jars (#5918)
    
    This change results in this error on master at `3a3dd66e21`.
    
    ```
    [WARNING] Rule 2: de.skuzzle.enforcer.restrictimports.rule.RestrictImports 
failed with message:
    
    Banned imports detected:
    Reason: Use shaded version in hbase-thirdparty{code}
 


was (Author: JIRAUSER289562):
[~zhangduo] Thank you for the reply! It seems yes. Upgrading from 2.6.0 to the 
commit before HBASE-28577 will succeed.

 

I tested upgrading from 2.6.0 to the following 2 commits from master branch
{code:java}
Upgrade crashed. (I attached the failure log: hbase--master-440ed844e077.log 

commit 419666b8eb8a881724fe6f65e8235a4220824e51 (HEAD)
Author: lixiaobao <977734...@qq.com>
Date:   Wed May 22 18:34:42 2024 +0800    HBASE-28577 Remove deprecated methods 
in KeyValue (#5883)
    
    Co-authored-by: lixiaobao 
    Co-authored-by: 李小保 
    Signed-off-by: Duo Zhang 


===

Upgrade succeeded.

commit 3b18ba664a6dcde344e13fe9305c272592195c03
Author: Nick Dimiduk 
Date:   Wed May 22 10:05:54 2024 +0200    HBASE-28605 Add ErrorProne ban on 
Hadoop shaded thirdparty jars (#5918)
    
    This change results in this error on master at `3a3dd66e21`.
    
    ```
    [WARNING] Rule 2: de.skuzzle.enforcer.restrictimports.rule.RestrictImports 
failed with message:
    
    Banned imports detected:
    Reason: Use shaded version in hbase-thirdparty{code}
 

> Upgrade from 2.6.0 to 3.0.0 crashed
> ---
>
> Key: HBASE-28812
> URL: https://issues.apache.org/jira/browse/HBASE-28812
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0
>Reporter: Ke Han
>Priority: Major
>  Labels: upgrade
> Attachments: hbase--master-2d6e4fad2af5.log, 
> hbase--master-440ed844e077.log
>
>
> I am trying to upgrade from 2.6.0 (stable release) to 3.0.0. I built 3.0.0 
> using the following commit (a030e8099840e640684a68b6e4a79e7c1d5a6823)
> {code:java}
> commit a030e8099840e640684a68b6e4a79e7c1d5a6823 (HEAD -> branch-3, 
> upstream/branch-3)
> Author: Ray Mattingly 
> Date:   Mon Sep 2 04:38:29 2024 -0400    HBASE-28697 Don't clean bulk load 
> system entries until backup is complete (#6089)
>     
>     Co-authored-by: Ray Mattingly 
> {code}
> However, the HMaster would crash during the upgrade process.
> h1. Reproduce
> Step1: Start up 2.6.0 cluster (1 HDFS, 1 HM, 1 RS)
> Step2: Stop the entire cluster
> Step3: Upgrade to 3.0.0 cluster.
> HMaster will crash with the following error message
> {code:java}
> 2024-09-04T04:29:18,917 WARN  [master/hmaster:16000:becomeActiveMaster] 
> regionserver.HRegion: Failed initialize of region= 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682., starting to roll back 
> memstore
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
> Trailer from file 
> hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1215)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1158)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1030)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:974) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7794) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHO

[jira] [Comment Edited] (HBASE-28812) Upgrade from 2.6.0 to 3.0.0 crashed

2024-09-04 Thread Ke Han (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879317#comment-17879317
 ] 

Ke Han edited comment on HBASE-28812 at 9/4/24 5:43 PM:


[~zhangduo] Thank you for the reply! It seems yes. Upgrading from 2.6.0 to the 
commit before HBASE-28577 will succeed.

 

I tested upgrading from 2.6.0 to the following 2 commits from master branch
{code:java}
Upgrade crashed. (I attached the failure log: hbase--master-440ed844e077.log 

commit 419666b8eb8a881724fe6f65e8235a4220824e51 (HEAD)
Author: lixiaobao <977734...@qq.com>
Date:   Wed May 22 18:34:42 2024 +0800    HBASE-28577 Remove deprecated methods 
in KeyValue (#5883)
    
    Co-authored-by: lixiaobao 
    Co-authored-by: 李小保 
    Signed-off-by: Duo Zhang 


===

Upgrade succeeds

commit 3b18ba664a6dcde344e13fe9305c272592195c03
Author: Nick Dimiduk 
Date:   Wed May 22 10:05:54 2024 +0200    HBASE-28605 Add ErrorProne ban on 
Hadoop shaded thirdparty jars (#5918)
    
    This change results in this error on master at `3a3dd66e21`.
    
    ```
    [WARNING] Rule 2: de.skuzzle.enforcer.restrictimports.rule.RestrictImports 
failed with message:
    
    Banned imports detected:
    Reason: Use shaded version in hbase-thirdparty{code}
 


was (Author: JIRAUSER289562):
[~zhangduo] Thank you for the reply! It seems yes. Upgrading from 2.6.0 to the 
commit before HBASE-28577 will succeed.

 

I tested upgrading from 2.6.0 to the following 2 commits from master branch

 
{code:java}
Upgrade crashed. (I attached the failure log: hbase--master-440ed844e077.log 

commit 419666b8eb8a881724fe6f65e8235a4220824e51 (HEAD)
Author: lixiaobao <977734...@qq.com>
Date:   Wed May 22 18:34:42 2024 +0800    HBASE-28577 Remove deprecated methods 
in KeyValue (#5883)
    
    Co-authored-by: lixiaobao 
    Co-authored-by: 李小保 
    Signed-off-by: Duo Zhang 


===

Upgrade succeeds

commit 3b18ba664a6dcde344e13fe9305c272592195c03
Author: Nick Dimiduk 
Date:   Wed May 22 10:05:54 2024 +0200    HBASE-28605 Add ErrorProne ban on 
Hadoop shaded thirdparty jars (#5918)
    
    This change results in this error on master at `3a3dd66e21`.
    
    ```
    [WARNING] Rule 2: de.skuzzle.enforcer.restrictimports.rule.RestrictImports 
failed with message:
    
    Banned imports detected:
    Reason: Use shaded version in hbase-thirdparty{code}
 

> Upgrade from 2.6.0 to 3.0.0 crashed
> ---
>
> Key: HBASE-28812
> URL: https://issues.apache.org/jira/browse/HBASE-28812
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0
>Reporter: Ke Han
>Priority: Major
>  Labels: upgrade
> Attachments: hbase--master-2d6e4fad2af5.log, 
> hbase--master-440ed844e077.log
>
>
> I am trying to upgrade from 2.6.0 (stable release) to 3.0.0. I built 3.0.0 
> using the following commit (a030e8099840e640684a68b6e4a79e7c1d5a6823)
> {code:java}
> commit a030e8099840e640684a68b6e4a79e7c1d5a6823 (HEAD -> branch-3, 
> upstream/branch-3)
> Author: Ray Mattingly 
> Date:   Mon Sep 2 04:38:29 2024 -0400    HBASE-28697 Don't clean bulk load 
> system entries until backup is complete (#6089)
>     
>     Co-authored-by: Ray Mattingly 
> {code}
> However, the HMaster would crash during the upgrade process.
> h1. Reproduce
> Step1: Start up 2.6.0 cluster (1 HDFS, 1 HM, 1 RS)
> Step2: Stop the entire cluster
> Step3: Upgrade to 3.0.0 cluster.
> HMaster will crash with the following error message
> {code:java}
> 2024-09-04T04:29:18,917 WARN  [master/hmaster:16000:becomeActiveMaster] 
> regionserver.HRegion: Failed initialize of region= 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682., starting to roll back 
> memstore
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
> Trailer from file 
> hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1215)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1158)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1030)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:974) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7794) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.h

[jira] [Comment Edited] (HBASE-28812) Upgrade from 2.6.0 to 3.0.0 crashed

2024-09-04 Thread Ke Han (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879317#comment-17879317
 ] 

Ke Han edited comment on HBASE-28812 at 9/4/24 5:43 PM:


[~zhangduo] Thank you for the reply! It seems yes. Upgrading from 2.6.0 to the 
commit before HBASE-28577 will succeed.

 

I tested upgrading from 2.6.0 to the following 2 commits from master branch
{code:java}
Upgrade crashed. (I attached the failure log: hbase--master-440ed844e077.log 

commit 419666b8eb8a881724fe6f65e8235a4220824e51 (HEAD)
Author: lixiaobao <977734...@qq.com>
Date:   Wed May 22 18:34:42 2024 +0800    HBASE-28577 Remove deprecated methods 
in KeyValue (#5883)
    
    Co-authored-by: lixiaobao 
    Co-authored-by: 李小保 
    Signed-off-by: Duo Zhang 


===

Upgrade succeeded.

commit 3b18ba664a6dcde344e13fe9305c272592195c03
Author: Nick Dimiduk 
Date:   Wed May 22 10:05:54 2024 +0200    HBASE-28605 Add ErrorProne ban on 
Hadoop shaded thirdparty jars (#5918)
    
    This change results in this error on master at `3a3dd66e21`.
    
    ```
    [WARNING] Rule 2: de.skuzzle.enforcer.restrictimports.rule.RestrictImports 
failed with message:
    
    Banned imports detected:
    Reason: Use shaded version in hbase-thirdparty{code}
 


was (Author: JIRAUSER289562):
[~zhangduo] Thank you for the reply! It seems yes. Upgrading from 2.6.0 to the 
commit before HBASE-28577 will succeed.

 

I tested upgrading from 2.6.0 to the following 2 commits from master branch
{code:java}
Upgrade crashed. (I attached the failure log: hbase--master-440ed844e077.log 

commit 419666b8eb8a881724fe6f65e8235a4220824e51 (HEAD)
Author: lixiaobao <977734...@qq.com>
Date:   Wed May 22 18:34:42 2024 +0800    HBASE-28577 Remove deprecated methods 
in KeyValue (#5883)
    
    Co-authored-by: lixiaobao 
    Co-authored-by: 李小保 
    Signed-off-by: Duo Zhang 


===

Upgrade succeeds

commit 3b18ba664a6dcde344e13fe9305c272592195c03
Author: Nick Dimiduk 
Date:   Wed May 22 10:05:54 2024 +0200    HBASE-28605 Add ErrorProne ban on 
Hadoop shaded thirdparty jars (#5918)
    
    This change results in this error on master at `3a3dd66e21`.
    
    ```
    [WARNING] Rule 2: de.skuzzle.enforcer.restrictimports.rule.RestrictImports 
failed with message:
    
    Banned imports detected:
    Reason: Use shaded version in hbase-thirdparty{code}
 

> Upgrade from 2.6.0 to 3.0.0 crashed
> ---
>
> Key: HBASE-28812
> URL: https://issues.apache.org/jira/browse/HBASE-28812
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0
>Reporter: Ke Han
>Priority: Major
>  Labels: upgrade
> Attachments: hbase--master-2d6e4fad2af5.log, 
> hbase--master-440ed844e077.log
>
>
> I am trying to upgrade from 2.6.0 (stable release) to 3.0.0. I built 3.0.0 
> using the following commit (a030e8099840e640684a68b6e4a79e7c1d5a6823)
> {code:java}
> commit a030e8099840e640684a68b6e4a79e7c1d5a6823 (HEAD -> branch-3, 
> upstream/branch-3)
> Author: Ray Mattingly 
> Date:   Mon Sep 2 04:38:29 2024 -0400    HBASE-28697 Don't clean bulk load 
> system entries until backup is complete (#6089)
>     
>     Co-authored-by: Ray Mattingly 
> {code}
> However, the HMaster would crash during the upgrade process.
> h1. Reproduce
> Step1: Start up 2.6.0 cluster (1 HDFS, 1 HM, 1 RS)
> Step2: Stop the entire cluster
> Step3: Upgrade to 3.0.0 cluster.
> HMaster will crash with the following error message
> {code:java}
> 2024-09-04T04:29:18,917 WARN  [master/hmaster:16000:becomeActiveMaster] 
> regionserver.HRegion: Failed initialize of region= 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682., starting to roll back 
> memstore
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
> Trailer from file 
> hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1215)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1158)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1030)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:974) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7794) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.ha

[jira] [Comment Edited] (HBASE-28812) Upgrade from 2.6.0 to 3.0.0 crashed

2024-09-04 Thread Ke Han (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17879317#comment-17879317
 ] 

Ke Han edited comment on HBASE-28812 at 9/4/24 5:42 PM:


[~zhangduo] Thank you for the reply! It seems yes. Upgrading from 2.6.0 to the 
commit before HBASE-28577 will succeed.

 

I tested upgrading from 2.6.0 to the following 2 commits from master branch

 
{code:java}
Upgrade crashed. (I attached the failure log: hbase--master-440ed844e077.log 

commit 419666b8eb8a881724fe6f65e8235a4220824e51 (HEAD)
Author: lixiaobao <977734...@qq.com>
Date:   Wed May 22 18:34:42 2024 +0800    HBASE-28577 Remove deprecated methods 
in KeyValue (#5883)
    
    Co-authored-by: lixiaobao 
    Co-authored-by: 李小保 
    Signed-off-by: Duo Zhang 


===

Upgrade succeeds

commit 3b18ba664a6dcde344e13fe9305c272592195c03
Author: Nick Dimiduk 
Date:   Wed May 22 10:05:54 2024 +0200    HBASE-28605 Add ErrorProne ban on 
Hadoop shaded thirdparty jars (#5918)
    
    This change results in this error on master at `3a3dd66e21`.
    
    ```
    [WARNING] Rule 2: de.skuzzle.enforcer.restrictimports.rule.RestrictImports 
failed with message:
    
    Banned imports detected:
    Reason: Use shaded version in hbase-thirdparty{code}
 


was (Author: JIRAUSER289562):
[~zhangduo] Thank you for the reply! It seems yes. Upgrading from 2.6.0 to the 
commit before HBASE-28577 will succeed.

 

I tested upgrading from 2.6.0 to the following 2 commits from master branch
|Upgrade crashed. (I attached the failure log: hbase--master-440ed844e077.log|

commit 419666b8eb8a881724fe6f65e8235a4220824e51 (HEAD)
Author: lixiaobao <977734...@qq.com>
Date:   Wed May 22 18:34:42 2024 +0800

HBASE-28577 Remove deprecated methods in KeyValue (#5883)

Co-authored-by: lixiaobao 
Co-authored-by: 李小保 
Signed-off-by: Duo Zhang |
|Upgrade failed.|

commit 3b18ba664a6dcde344e13fe9305c272592195c03
Author: Nick Dimiduk 
Date:   Wed May 22 10:05:54 2024 +0200

HBASE-28605 Add ErrorProne ban on Hadoop shaded thirdparty jars (#5918)

This change results in this error on master at `3a3dd66e21`.

```
[WARNING] Rule 2: de.skuzzle.enforcer.restrictimports.rule.RestrictImports 
failed with message:

Banned imports detected:
Reason: Use shaded version in hbase-thirdparty|

> Upgrade from 2.6.0 to 3.0.0 crashed
> ---
>
> Key: HBASE-28812
> URL: https://issues.apache.org/jira/browse/HBASE-28812
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0
>Reporter: Ke Han
>Priority: Major
>  Labels: upgrade
> Attachments: hbase--master-2d6e4fad2af5.log, 
> hbase--master-440ed844e077.log
>
>
> I am trying to upgrade from 2.6.0 (stable release) to 3.0.0. I built 3.0.0 
> using the following commit (a030e8099840e640684a68b6e4a79e7c1d5a6823)
> {code:java}
> commit a030e8099840e640684a68b6e4a79e7c1d5a6823 (HEAD -> branch-3, 
> upstream/branch-3)
> Author: Ray Mattingly 
> Date:   Mon Sep 2 04:38:29 2024 -0400    HBASE-28697 Don't clean bulk load 
> system entries until backup is complete (#6089)
>     
>     Co-authored-by: Ray Mattingly 
> {code}
> However, the HMaster would crash during the upgrade process.
> h1. Reproduce
> Step1: Start up 2.6.0 cluster (1 HDFS, 1 HM, 1 RS)
> Step2: Stop the entire cluster
> Step3: Upgrade to 3.0.0 cluster.
> HMaster will crash with the following error message
> {code:java}
> 2024-09-04T04:29:18,917 WARN  [master/hmaster:16000:becomeActiveMaster] 
> regionserver.HRegion: Failed initialize of region= 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682., starting to roll back 
> memstore
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
> Trailer from file 
> hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1215)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1158)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1030)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:974) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7794) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionser

[jira] [Updated] (HBASE-28812) Upgrade from 2.6.0 to 3.0.0 crashed

2024-09-03 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28812:
---
Description: 
I am trying to upgrade from 2.6.0 (stable release) to 3.0.0. I built 3.0.0 
using the following commit (a030e8099840e640684a68b6e4a79e7c1d5a6823)
{code:java}
commit a030e8099840e640684a68b6e4a79e7c1d5a6823 (HEAD -> branch-3, 
upstream/branch-3)
Author: Ray Mattingly 
Date:   Mon Sep 2 04:38:29 2024 -0400    HBASE-28697 Don't clean bulk load 
system entries until backup is complete (#6089)
    
    Co-authored-by: Ray Mattingly 
{code}
However, the HMaster would crash during the upgrade process.
h1. Reproduce

Step1: Start up 2.6.0 cluster (1 HDFS, 1 HM, 1 RS)

Step2: Stop the entire cluster

Step3: Upgrade to 3.0.0 cluster.

HMaster will crash with the following error message
{code:java}
2024-09-04T04:29:18,917 WARN  [master/hmaster:16000:becomeActiveMaster] 
regionserver.HRegion: Failed initialize of region= 
master:store,,1.1595e783b53d99cd5eef43b6debb2682., starting to roll back 
memstore
java.io.IOException: java.io.IOException: 
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
Trailer from file 
hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75
        at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1215)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1158)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1030)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:974) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7794) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7749)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:277) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:432)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:135)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1003)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.io.IOException: 
org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
Trailer from file 
hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75
        at 
org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:289)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:339)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:301) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6924)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1181) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:1178) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j

[jira] [Updated] (HBASE-28812) Upgrade from 2.6.0 to 3.0.0 crashed

2024-09-03 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28812:
---
Labels: upgrade  (was: )

> Upgrade from 2.6.0 to 3.0.0 crashed
> ---
>
> Key: HBASE-28812
> URL: https://issues.apache.org/jira/browse/HBASE-28812
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0
>Reporter: Ke Han
>Priority: Major
>  Labels: upgrade
> Attachments: hbase--master-2d6e4fad2af5.log
>
>
> I am trying to upgrade from 2.6.0 (stable release) to 3.0.0. I built 3.0.0 
> using the following commit (a030e8099840e640684a68b6e4a79e7c1d5a6823)
> {code:java}
> commit a030e8099840e640684a68b6e4a79e7c1d5a6823 (HEAD -> branch-3, 
> upstream/branch-3)
> Author: Ray Mattingly 
> Date:   Mon Sep 2 04:38:29 2024 -0400    HBASE-28697 Don't clean bulk load 
> system entries until backup is complete (#6089)
>     
>     Co-authored-by: Ray Mattingly 
> {code}
> h1. Reproduce
> Start up 2.6.0 cluster (1 HDFS, 1 HM, 1 RS), stop the entire cluster and then 
> start up the 3.0.0 cluster. HMaster will crash with the following error
> {code:java}
> 2024-09-04T04:29:18,917 WARN  [master/hmaster:16000:becomeActiveMaster] 
> regionserver.HRegion: Failed initialize of region= 
> master:store,,1.1595e783b53d99cd5eef43b6debb2682., starting to roll back 
> memstore
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
> Trailer from file 
> hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1215)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeStores(HRegion.java:1158)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1030)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:974) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7794) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:7749)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.open(MasterRegion.java:277)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:432)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:135)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1003)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at java.lang.Thread.run(Thread.java:833) ~[?:?]
> Caused by: java.io.IOException: 
> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile 
> Trailer from file 
> hdfs://master:8020/hbase/MasterData/data/master/store/1595e783b53d99cd5eef43b6debb2682/info/82c6d244b6244c179cdbafcead00ed75
>         at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.openStoreFiles(StoreEngine.java:289)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.StoreEngine.initialize(StoreEngine.java:339)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:301) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:6924)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java

[jira] [Updated] (HBASE-28660) list_namespace not working after an incorrect user input

2024-06-15 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28660:
---
Description: 
When using hbase-3.0.0 or 2.6.0, there's a shell bug related to failure 
handling.

If user inputs an incorrect *list_namespace* command, hshell throws an 
exception. However, it has a side effect on the following *list_namespace* 
command: the result becomes empty (incorrect).
h1. Reproduce

Execute the following 2 commands can reproduce this bug
 * The first command is an incorrect list_namespace command, which causes and 
exception.
 * The second command is a correct list_namespace command, its return value is 
incorrect (empty).

{code:java}
list_namespace, 'ns.*'
list_namespace{code}
Here's the execution result

The return result of the second command is incorrect.
{code:java}
hbase:002:0> list_namespace, 'ns.*'
Traceback (most recent call last):
SyntaxError ((hbase):2: syntax error, unexpected end-of-file)
list_namespace, 'ns.*'
                      ^
hbase:003:0> list_namespace
hbase:004:0>  {code}
The expected output of list_namespace is
{code:java}
hbase:001:0> list_namespace
NAMESPACE                                                                       
                                                      
default                                                                         
                                                      
hbase                                                                           
                                                      
2 row(s)
Took 0.6820 seconds {code}
h1. Root Cause

This could be a bug in shell related to list_namespace. Restarting the shell 
restores normal functionality of the list_namespace command.

  was:
When using hbase-3.0.0 or 2.6.0, there's a shell bug related to failure 
handling.

If user inputs an incorrect *list_namespace* command, hshell throws an 
exception. However, it has a side effect on the following *list_namespace* 
command: the result becomes empty (incorrect).
h1. Reproduce

Execute the following 2 commands can reproduce this bug
 * The first command is an incorrect list_namespace command, which causes and 
exception.
 * The second command is a correct list_namespace command, its return value is 
incorrect (empty).

{code:java}
list_namespace, 'ns.*'
list_namespace{code}
Here's the execution result

The return result of the second command is incorrect.
{code:java}
hbase:002:0> list_namespace, 'ns.*'
Traceback (most recent call last):
SyntaxError ((hbase):2: syntax error, unexpected end-of-file)
list_namespace, 'ns.*'
                      ^
hbase:003:0> list_namespace
hbase:004:0>  {code}
The expected output of list_namespace is
{code:java}
hbase:001:0> list_namespace
NAMESPACE                                                                       
                                                      
default                                                                         
                                                      
hbase                                                                           
                                                      
2 row(s)
Took 0.6820 seconds {code}
h1. Root Cause

This could be a bug in shell related to list_namespace. Restart the shell would 
make the shell functional again.


> list_namespace not working after an incorrect user input
> 
>
> Key: HBASE-28660
> URL: https://issues.apache.org/jira/browse/HBASE-28660
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-beta-2
>Reporter: Ke Han
>Priority: Major
>
> When using hbase-3.0.0 or 2.6.0, there's a shell bug related to failure 
> handling.
> If user inputs an incorrect *list_namespace* command, hshell throws an 
> exception. However, it has a side effect on the following *list_namespace* 
> command: the result becomes empty (incorrect).
> h1. Reproduce
> Execute the following 2 commands can reproduce this bug
>  * The first command is an incorrect list_namespace command, which causes and 
> exception.
>  * The second command is a correct list_namespace command, its return value 
> is incorrect (empty).
> {code:java}
> list_namespace, 'ns.*'
> list_namespace{code}
> Here's the execution result
> The return result of the second command is incorrect.
> {code:java}
> hbase:002:0> list_namespace, 'ns.*'
> Traceback (most recent call last):
> SyntaxError ((hbase):2: syntax error, unexpected end-of-file)
> list_namespace, 'ns.*'
>                       ^
> hbase:003:0> list_namespace
> hbase:004:0>  {code}
> The expected output of list_namespace is
> {code:java}
> hbase:001:0> list_namespace
> NAMESPACE                                                                     
>                                                         
> default                      

[jira] [Updated] (HBASE-28660) list_namespace not working after an incorrect user input

2024-06-12 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28660:
---
Summary: list_namespace not working after an incorrect user input  (was: 
list_namespace command not working after an incorrect user input)

> list_namespace not working after an incorrect user input
> 
>
> Key: HBASE-28660
> URL: https://issues.apache.org/jira/browse/HBASE-28660
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-beta-2
>Reporter: Ke Han
>Priority: Major
>
> When using hbase-3.0.0 or 2.6.0, there's a shell bug related to failure 
> handling.
> If user inputs an incorrect *list_namespace* command, hshell throws an 
> exception. However, it has a side effect on the following *list_namespace* 
> command: the result becomes empty (incorrect).
> h1. Reproduce
> Execute the following 2 commands can reproduce this bug
>  * The first command is an incorrect list_namespace command, which causes and 
> exception.
>  * The second command is a correct list_namespace command, its return value 
> is incorrect (empty).
> {code:java}
> list_namespace, 'ns.*'
> list_namespace{code}
> Here's the execution result
> The return result of the second command is incorrect.
> {code:java}
> hbase:002:0> list_namespace, 'ns.*'
> Traceback (most recent call last):
> SyntaxError ((hbase):2: syntax error, unexpected end-of-file)
> list_namespace, 'ns.*'
>                       ^
> hbase:003:0> list_namespace
> hbase:004:0>  {code}
> The expected output of list_namespace is
> {code:java}
> hbase:001:0> list_namespace
> NAMESPACE                                                                     
>                                                         
> default                                                                       
>                                                         
> hbase                                                                         
>                                                         
> 2 row(s)
> Took 0.6820 seconds {code}
> h1. Root Cause
> This could be a bug in shell related to list_namespace. Restart the shell 
> would make the shell functional again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28660) list_namespace command not working after an incorrect user input

2024-06-12 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28660:
---
Description: 
When using hbase-3.0.0 or 2.6.0, there's a shell bug related to failure 
handling.

If user inputs an incorrect *list_namespace* command, hshell throws an 
exception. However, it has a side effect on the following *list_namespace* 
command: the result becomes empty (incorrect).
h1. Reproduce

Execute the following 2 commands can reproduce this bug
 * The first command is an incorrect list_namespace command, which causes and 
exception.
 * The second command is a correct list_namespace command, its return value is 
incorrect (empty).

{code:java}
list_namespace, 'ns.*'
list_namespace{code}
Here's the execution result

The return result of the second command is incorrect.
{code:java}
hbase:002:0> list_namespace, 'ns.*'
Traceback (most recent call last):
SyntaxError ((hbase):2: syntax error, unexpected end-of-file)
list_namespace, 'ns.*'
                      ^
hbase:003:0> list_namespace
hbase:004:0>  {code}
The expected output of list_namespace is
{code:java}
hbase:001:0> list_namespace
NAMESPACE                                                                       
                                                      
default                                                                         
                                                      
hbase                                                                           
                                                      
2 row(s)
Took 0.6820 seconds {code}
h1. Root Cause

This could be a bug in shell related to list_namespace. Restart the shell would 
make the shell functional again.

  was:
When using hbase-3.0.0 or 2.6.0, there's a shell bug related to failure 
handling.

If user inputs an incorrect *list_namespace* command, hshell throws an 
exception. However, it has a side effect on the following *list_namespace* 
command: the result becomes empty (incorrect).
h1. Reproduce

Execute the following 2 commands can reproduce this bug
 * The first command is an incorrect list_namespace command, which causes and 
exception.
 * The second command is a correct list_namespace command, its return value is 
incorrect (empty).

 
{code:java}
list_namespace, 'ns.*'
list_namespace{code}
Here's the execution results

The first command returns correct, however, the third command returns empty.

 
{code:java}
hbase:002:0> list_namespace, 'ns.*'
Traceback (most recent call last):
SyntaxError ((hbase):2: syntax error, unexpected end-of-file)
list_namespace, 'ns.*'
                      ^
hbase:003:0> list_namespace
hbase:004:0>  {code}
 

The correct output of list_namespace is
{code:java}
hbase:001:0> list_namespace
NAMESPACE                                                                       
                                                      
default                                                                         
                                                      
hbase                                                                           
                                                      
2 row(s)
Took 0.6820 seconds {code}
h1. Root Cause

This could be a bug in shell related to list_namespace. Restart the shell would 
make the shell functional again.


> list_namespace command not working after an incorrect user input
> 
>
> Key: HBASE-28660
> URL: https://issues.apache.org/jira/browse/HBASE-28660
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.6.0, 3.0.0-beta-2
>Reporter: Ke Han
>Priority: Major
>
> When using hbase-3.0.0 or 2.6.0, there's a shell bug related to failure 
> handling.
> If user inputs an incorrect *list_namespace* command, hshell throws an 
> exception. However, it has a side effect on the following *list_namespace* 
> command: the result becomes empty (incorrect).
> h1. Reproduce
> Execute the following 2 commands can reproduce this bug
>  * The first command is an incorrect list_namespace command, which causes and 
> exception.
>  * The second command is a correct list_namespace command, its return value 
> is incorrect (empty).
> {code:java}
> list_namespace, 'ns.*'
> list_namespace{code}
> Here's the execution result
> The return result of the second command is incorrect.
> {code:java}
> hbase:002:0> list_namespace, 'ns.*'
> Traceback (most recent call last):
> SyntaxError ((hbase):2: syntax error, unexpected end-of-file)
> list_namespace, 'ns.*'
>                       ^
> hbase:003:0> list_namespace
> hbase:004:0>  {code}
> The expected output of list_namespace is
> {code:java}
> hbase:001:0> list_namespace
> NAMESPACE                                                                     
>                                                         
> default

[jira] [Created] (HBASE-28660) list_namespace command not working after an incorrect user input

2024-06-12 Thread Ke Han (Jira)
Ke Han created HBASE-28660:
--

 Summary: list_namespace command not working after an incorrect 
user input
 Key: HBASE-28660
 URL: https://issues.apache.org/jira/browse/HBASE-28660
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.6.0, 3.0.0-beta-2
Reporter: Ke Han


When using hbase-3.0.0 or 2.6.0, there's a shell bug related to failure 
handling.

If user inputs an incorrect *list_namespace* command, hshell throws an 
exception. However, it has a side effect on the following *list_namespace* 
command: the result becomes empty (incorrect).
h1. Reproduce

Execute the following 2 commands can reproduce this bug
 * The first command is an incorrect list_namespace command, which causes and 
exception.
 * The second command is a correct list_namespace command, its return value is 
incorrect (empty).

 
{code:java}
list_namespace, 'ns.*'
list_namespace{code}
Here's the execution results

The first command returns correct, however, the third command returns empty.

 
{code:java}
hbase:002:0> list_namespace, 'ns.*'
Traceback (most recent call last):
SyntaxError ((hbase):2: syntax error, unexpected end-of-file)
list_namespace, 'ns.*'
                      ^
hbase:003:0> list_namespace
hbase:004:0>  {code}
 

The correct output of list_namespace is
{code:java}
hbase:001:0> list_namespace
NAMESPACE                                                                       
                                                      
default                                                                         
                                                      
hbase                                                                           
                                                      
2 row(s)
Took 0.6820 seconds {code}
h1. Root Cause

This could be a bug in shell related to list_namespace. Restart the shell would 
make the shell functional again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28659) NPE in hmaster (setServerState function)

2024-06-12 Thread Ke Han (Jira)
Ke Han created HBASE-28659:
--

 Summary: NPE in hmaster (setServerState function)
 Key: HBASE-28659
 URL: https://issues.apache.org/jira/browse/HBASE-28659
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 3.0.0-beta-2
Reporter: Ke Han
 Attachments: hbase--master-d16bb50815b7.log

I met NPE in master node after migrating data from 2.5.8.
{code:java}
[ERROR LOG]
executionId = LrmpjV32
ConfigIdx = test9
Node02024-05-11T10:45:57,896 ERROR [PEWorker-15] procedure2.ProcedureExecutor: 
CODE-BUG: Uncaught runtime exception: pid=48, 
state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS, hasLock=true; ServerCrashProcedure 
hregion1,16020,1715424228375, splitWal=true, meta=true
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.assignment.RegionStates.setServerState(RegionStates.java:409)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.assignment.RegionStates.logSplitting(RegionStates.java:435)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:226)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
2024-05-11T10:45:57,918 ERROR [PEWorker-15] procedure2.ProcedureExecutor: Root 
Procedure pid=48, state=FAILED:SERVER_CRASH_SPLIT_LOGS, hasLock=true, 
exception=java.lang.NullPointerException via CODE-BUG: Uncaught runtime 
exception: pid=48, state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS, hasLock=true; 
ServerCrashProcedure hregion1,16020,1715424228375, splitWal=true, 
meta=true:java.lang.NullPointerException; ServerCrashProcedure 
hregion1,16020,1715424228375, splitWal=true, meta=true does not support 
rollback but the execution failed and try to rollback, code bug?
org.apache.hadoop.hbase.procedure2.RemoteProcedureException: 
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1826)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1484)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1100(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] {code}
h1. Reproduce

This bug cannot be reproduced deterministically.

I upgraded hbase cluster form 2.5.8 to 3.0.0 (commit: 516c89e8597fb) with 4 
nodes (1HM, 2RS, 1HDFS)
h1. Root Case

>From the stack trace, the bug is in setServerState state function.
{code:java}
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java

  private void setServerState(ServerName serverName, ServerState state) {
    ServerStateNode serverNode = getServerNode(serverName);
    synchronized (serverNode) { // NPE!
      serverNode.setState(state);
    }
  } {code}
The serverNode sometimes could be null and there's no null pointer check.
{code:java}
  /** Returns Pertinent ServerStateNode or NULL if none found (Do not make 
modifications). */
  public ServerStateNode getServerNode(final ServerName serverName) {
    return serverMap.get(serverName);
  } {code}
The potential fix could be adding a NULL serverNode. However, how it runs into 
this buggy state is unclear. 

I am running the workloads that could trigger the bug multiple times to see if 
I can find more information.

I have attached the error log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28590) NPE after upgrade from 2.5.8 to 3.0.0

2024-05-11 Thread Ke Han (Jira)
Ke Han created HBASE-28590:
--

 Summary: NPE after upgrade from 2.5.8 to 3.0.0
 Key: HBASE-28590
 URL: https://issues.apache.org/jira/browse/HBASE-28590
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 3.0.0
Reporter: Ke Han
 Attachments: commands.txt, hbase--master-fc906f1808de.log, 
persistent.tar.gz

When upgrade hbase cluster from 2.5.8 to 3.0.0 (commit: 516c89e8597fb6), I met 
the following NPE in master log.
{code:java}
2024-05-11T02:17:47,293 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
2024-05-11T02:17:47,326 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
2024-05-11T02:17:47,337 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]{code}
h1. Reproduce

This bug cannot be reproduced deterministically but it happens pretty 
frequently (10% to trigger with the following steps.

1. Start up 2.5.8 cluster with default configuration (1 HM, 2 RS, 1 HDFS)

2. Execute the commands in commands.txt

3. Stop the 2.5.8 cluster and upgrade to 3.0.0 cluster with default 
configuration (commit: 516c89e8597fb6, 1 HM, 2 RS, 1 HDFS) 

The error message will occur in master log.

I attached (1) commands to reproduce it (2) master log and (3) full error logs 
of all nodes.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28590) NPE after upgrade from 2.5.8 to 3.0.0

2024-05-11 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28590:
---
Description: 
When upgrade hbase cluster from 2.5.8 to 3.0.0 (commit: 516c89e8597fb6), I met 
the following NPE in master log.
{code:java}
2024-05-11T02:17:47,293 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
2024-05-11T02:17:47,326 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
2024-05-11T02:17:47,337 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]{code}
h1. Reproduce

This bug cannot be reproduced deterministically but it happens pretty 
frequently (10% to trigger with the following steps.

1. Start up 2.5.8 cluster with default configuration (1 HM, 2 RS, 1 HDFS)

2. Execute the commands in commands.txt

3. Stop the 2.5.8 cluster and upgrade to 3.0.0 cluster with default 
configuration (commit: 516c89e8597fb6, 1 HM, 2 RS, 1 HDFS) 

The error message will occur in master log.

I attached (1) commands to reproduce it (2) master log and (3) full error logs 
of all nodes.

  was:
When upgrade hbase cluster from 2.5.8 to 3.0.0 (commit: 516c89e8597fb6), I met 
the following NPE in master log.
{code:java}
2024-05-11T02:17:47,293 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-11 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Summary: Upgrade from 2.5.8 to 3.0.0 crash with 
InvalidProtocolBufferException: Message missing required fields: 
old_table_schema  (was: Upgrade from 2.5.8 to 3.0 crash with 
InvalidProtocolBufferException: Message missing required fields: 
old_table_schema)

> Upgrade from 2.5.8 to 3.0.0 crash with InvalidProtocolBufferException: 
> Message missing required fields: old_table_schema
> 
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-033a47be7d1d.log, persistent.tar.gz
>
>
> When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>    

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Description: 
When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 
2 HDFS), I met the following exception and the upgrade failed.
{code:java}
2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
2024-05-10T00:54:45,937 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1715302475720: Unhandled 
exception. Starting shutdown. *
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractPar

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Description: 
When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 
2 HDFS), I met the following exception and the upgrade failed.
{code:java}
2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
2024-05-10T00:54:45,937 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1715302475720: Unhandled 
exception. Starting shutdown. *
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractPar

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Description: 
When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 
2 HDFS), I met the following exception and the upgrade failed.
{code:java}
2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
2024-05-10T00:54:45,937 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1715302475720: Unhandled 
exception. Starting shutdown. *
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractPar

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Attachment: hbase--master-033a47be7d1d.log
persistent.tar.gz

> Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message 
> missing required fields: old_table_schema
> --
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-033a47be7d1d.log, persistent.tar.gz
>
>
> When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Description: 
When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 
2 HDFS), I met the following exception and the upgrade failed.
{code:java}
2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
2024-05-10T00:54:45,937 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1715302475720: Unhandled 
exception. Starting shutdown. *
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractPar

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Attachment: (was: hbase--master-cc13b0df0f3a.log)

> Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message 
> missing required fields: old_table_schema
> --
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
>
> When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-2-SNAPSHO

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Attachment: (was: persistent.tar.gz)

> Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message 
> missing required fields: old_table_schema
> --
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
>
> When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-b

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Attachment: (was: commands.txt)

> Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message 
> missing required fields: old_table_schema
> --
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-cc13b0df0f3a.log, persistent.tar.gz
>
>
> When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(T

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Attachment: commands.txt

> Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message 
> missing required fields: old_table_schema
> --
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: commands.txt, hbase--master-cc13b0df0f3a.log, 
> persistent.tar.gz
>
>
> When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnab

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Attachment: (was: commands_700.txt)

> Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message 
> missing required fields: old_table_schema
> --
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-cc13b0df0f3a.log, persistent.tar.gz
>
>
> When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Description: 
When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 
2 HDFS), I met the following exception and the upgrade failed.
{code:java}
2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
2024-05-09T20:16:20,639 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1715285771112: Unhandled 
exception. Starting shutdown. *
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractPar

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Attachment: (was: commands.txt)

> Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message 
> missing required fields: old_table_schema
> --
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: commands_700.txt, hbase--master-cc13b0df0f3a.log, 
> persistent.tar.gz
>
>
> When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lamb

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Attachment: commands_700.txt

> Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message 
> missing required fields: old_table_schema
> --
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: commands_700.txt, hbase--master-cc13b0df0f3a.log, 
> persistent.tar.gz
>
>
> When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$trac

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Description: 
When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 
2 HDFS), I met the following exception and the upgrade failed.
{code:java}
2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
2024-05-09T20:16:20,639 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1715285771112: Unhandled 
exception. Starting shutdown. *
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractPar

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Description: 
When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 
2 HDFS), I met the following exception and the upgrade failed.
{code:java}
2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
2024-05-09T20:16:20,639 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1715285771112: Unhandled 
exception. Starting shutdown. *
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractPar

[jira] [Created] (HBASE-28583) Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-09 Thread Ke Han (Jira)
Ke Han created HBASE-28583:
--

 Summary: Upgrade from 2.5.8 to 3.0 crash with 
InvalidProtocolBufferException: Message missing required fields: 
old_table_schema
 Key: HBASE-28583
 URL: https://issues.apache.org/jira/browse/HBASE-28583
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.5.8, 3.0.0
Reporter: Ke Han
 Attachments: commands.txt, hbase--master-cc13b0df0f3a.log, 
persistent.tar.gz

When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 RS, 
2 HDFS), I met the following exception and the upgrade failed.

 
{code:java}
2024-05-09T20:16:20,638 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing required fields: old_table_schema
        at 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
 ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
 ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
2024-05-09T20:16:20,639 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1715285771112: Unhandled 
exception. Starting shutdown. *
org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: 
Message missing requi

[jira] [Resolved] (HBASE-28519) HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1

2024-04-13 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han resolved HBASE-28519.

Fix Version/s: 3.0.0-beta-2
   Resolution: Fixed

> HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1
> ---
>
> Key: HBASE-28519
> URL: https://issues.apache.org/jira/browse/HBASE-28519
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 3.0.0-alpha-4, 3.0.0-beta-1, 2.5.8
>Reporter: Ke Han
>Priority: Critical
> Fix For: 3.0.0-beta-2
>
> Attachments: all_logs.tar.gz, hbase--master-64f850a4e287.log
>
>
> h1. Reproduce
> Step1: Start up HBase-2.5.8 cluster: 4 nodes: 1 HM, 2 RS, 1 HDFS 
> (hadoop-2.10.2).
> Step2: Perform a full-stop upgrade to HBase-3.0.0-beta-1 cluster: 1 HM, 2 RS, 
> 1 HDFS (hadoop-2.10.2). +(No command is needed before the upgrade)+
> HMaster aborts with the following exception
> {code:java}
> 2024-04-13T03:47:15,969 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
> Caused by: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 
> actions: RetriesExhaustedException: 2 times, servers with issues: 
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:92)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:122)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:252)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1532)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         ... 5 more
> 2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Master server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.quotas.MasterQuotasObserver]
> 2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: * ABORTING master hmaster,16000,1712980015693: Unhandled 
> exception. Starting shutdown. *
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.

[jira] [Commented] (HBASE-28519) HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1

2024-04-13 Thread Ke Han (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17836827#comment-17836827
 ] 

Ke Han commented on HBASE-28519:


[~zhangduo] Thank you for the reply! I'll try upgrading to hbase with 
HBASE-28376.

> HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1
> ---
>
> Key: HBASE-28519
> URL: https://issues.apache.org/jira/browse/HBASE-28519
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 3.0.0-alpha-4, 3.0.0-beta-1, 2.5.8
>Reporter: Ke Han
>Priority: Critical
> Attachments: all_logs.tar.gz, hbase--master-64f850a4e287.log
>
>
> h1. Reproduce
> Step1: Start up HBase-2.5.8 cluster: 4 nodes: 1 HM, 2 RS, 1 HDFS 
> (hadoop-2.10.2).
> Step2: Perform a full-stop upgrade to HBase-3.0.0-beta-1 cluster: 1 HM, 2 RS, 
> 1 HDFS (hadoop-2.10.2). +(No command is needed before the upgrade)+
> HMaster aborts with the following exception
> {code:java}
> 2024-04-13T03:47:15,969 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
> Caused by: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 
> actions: RetriesExhaustedException: 2 times, servers with issues: 
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:92)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:122)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:252)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1532)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         ... 5 more
> 2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Master server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.quotas.MasterQuotasObserver]
> 2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: * ABORTING master hmaster,16000,1712980015693: Unhandled 
> exception. Starting shutdown. *
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.

[jira] [Updated] (HBASE-28519) HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1

2024-04-12 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28519:
---
Description: 
h1. Reproduce

Step1: Start up HBase-2.5.8 cluster: 4 nodes: 1 HM, 2 RS, 1 HDFS 
(hadoop-2.10.2).

Step2: Perform a full-stop upgrade to HBase-3.0.0-beta-1 cluster: 1 HM, 2 RS, 1 
HDFS (hadoop-2.10.2). +(No command is needed before the upgrade)+

HMaster aborts with the following exception
{code:java}
2024-04-13T03:47:15,969 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl 
[FAILED] to be RUNNING, but the service has FAILED
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: 
Failed 2 actions: RetriesExhaustedException: 2 times, servers with issues: 
        at 
org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)
 ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)
 ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)
 ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:92)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:122)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:252)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1532)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        ... 5 more
2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Master server abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.quotas.MasterQuotasObserver]
2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1712980015693: Unhandled 
exception. Starting shutdown. *
java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl 
[FAILED] to be RUNNING, but the service has FAILED
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
        a

[jira] [Updated] (HBASE-28519) HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1

2024-04-12 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28519:
---
Description: 
h1. Reproduce

Start up HBase-2.5.8 cluster: 4 nodes: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).

Then directly perform a full-stop upgrade to HBase-3.0.0-beta-1 cluster: 1 HM, 
2 RS, 1 HDFS (hadoop-2.10.2). (No command is needed before the upgrade)

HMaster aborts with the following exception:
{code:java}
2024-04-13T03:47:15,969 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl 
[FAILED] to be RUNNING, but the service has FAILED
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: 
Failed 2 actions: RetriesExhaustedException: 2 times, servers with issues: 
        at 
org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)
 ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)
 ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)
 ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:92)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:122)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:252)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1532)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        ... 5 more
2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Master server abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.quotas.MasterQuotasObserver]
2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1712980015693: Unhandled 
exception. Starting shutdown. *
java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl 
[FAILED] to be RUNNING, but the service has FAILED
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 

[jira] [Updated] (HBASE-28519) HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1

2024-04-12 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28519:
---
Priority: Critical  (was: Major)

> HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1
> ---
>
> Key: HBASE-28519
> URL: https://issues.apache.org/jira/browse/HBASE-28519
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 3.0.0-alpha-4, 3.0.0-beta-1, 2.5.8
>Reporter: Ke Han
>Priority: Critical
> Attachments: all_logs.tar.gz, hbase--master-64f850a4e287.log
>
>
> h1. Reproduce
> Start up HBase-2.5.8 cluster: 4 nodes: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).
> Execute one read command: LIST, then perform a full-stop upgrade to 
> HBase-3.0.0-beta-1 cluster: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).
> HMaster aborts with the following exception:
> {code:java}
> 2024-04-13T03:47:15,969 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
> Caused by: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 
> actions: RetriesExhaustedException: 2 times, servers with issues: 
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:92)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:122)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:252)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1532)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         ... 5 more
> 2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Master server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.quotas.MasterQuotasObserver]
> 2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: * ABORTING master hmaster,16000,1712980015693: Unhandled 
> exception. Starting shutdown. *
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apac

[jira] [Updated] (HBASE-28519) HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1

2024-04-12 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28519:
---
Component/s: master

> HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1
> ---
>
> Key: HBASE-28519
> URL: https://issues.apache.org/jira/browse/HBASE-28519
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0-alpha-1, 3.0.0-alpha-4, 3.0.0-beta-1, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: all_logs.tar.gz, hbase--master-64f850a4e287.log
>
>
> h1. Reproduce
> Start up HBase-2.5.8 cluster: 4 nodes: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).
> Execute one read command: LIST, then perform a full-stop upgrade to 
> HBase-3.0.0-beta-1 cluster: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).
> HMaster aborts with the following exception:
> {code:java}
> 2024-04-13T03:47:15,969 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
> Caused by: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 
> actions: RetriesExhaustedException: 2 times, servers with issues: 
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:92)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:122)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:252)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1532)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         ... 5 more
> 2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Master server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.quotas.MasterQuotasObserver]
> 2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: * ABORTING master hmaster,16000,1712980015693: Unhandled 
> exception. Starting shutdown. *
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.

[jira] [Updated] (HBASE-28519) HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1

2024-04-12 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28519:
---
Attachment: all_logs.tar.gz

> HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1
> ---
>
> Key: HBASE-28519
> URL: https://issues.apache.org/jira/browse/HBASE-28519
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 3.0.0-alpha-4, 3.0.0-beta-1, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: all_logs.tar.gz, hbase--master-64f850a4e287.log
>
>
> h1. Reproduce
> Start up HBase-2.5.8 cluster: 4 nodes: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).
> Execute one read command: LIST, then perform a full-stop upgrade to 
> HBase-3.0.0-beta-1 cluster: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).
> HMaster aborts with the following exception:
> {code:java}
> 2024-04-13T03:47:15,969 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
> Caused by: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 
> actions: RetriesExhaustedException: 2 times, servers with issues: 
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:92)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:122)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:252)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1532)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         ... 5 more
> 2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Master server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.quotas.MasterQuotasObserver]
> 2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: * ABORTING master hmaster,16000,1712980015693: Unhandled 
> exception. Starting shutdown. *
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClu

[jira] [Updated] (HBASE-28519) HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1

2024-04-12 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28519:
---
Description: 
h1. Reproduce

Start up HBase-2.5.8 cluster: 4 nodes: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).

Execute one read command: LIST, then perform a full-stop upgrade to 
HBase-3.0.0-beta-1 cluster: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).

HMaster aborts with the following exception:
{code:java}
2024-04-13T03:47:15,969 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl 
[FAILED] to be RUNNING, but the service has FAILED
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: 
Failed 2 actions: RetriesExhaustedException: 2 times, servers with issues: 
        at 
org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)
 ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)
 ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)
 ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:92)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:122)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:252)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1532)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        ... 5 more
2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Master server abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.quotas.MasterQuotasObserver]
2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1712980015693: Unhandled 
exception. Starting shutdown. *
java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl 
[FAILED] to be RUNNING, but the service has FAILED
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
        at java.lang.Thread.ru

[jira] [Updated] (HBASE-28519) HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1

2024-04-12 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28519:
---
Attachment: hbase--master-64f850a4e287.log

> HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1
> ---
>
> Key: HBASE-28519
> URL: https://issues.apache.org/jira/browse/HBASE-28519
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 3.0.0-alpha-4, 3.0.0-beta-1, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-64f850a4e287.log
>
>
> h1. Reproduce
> Start up HBase-2.5.8 cluster: 4 nodes: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).
> Execute one read command: LIST, then perform a full-stop upgrade to 
> HBase-3.0.0-beta-1 cluster: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).
> HMaster aborts with the following exception:
> {code:java}
> 2024-04-13T03:47:15,969 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
> ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
>  ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
> Caused by: 
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 2 
> actions: RetriesExhaustedException: 2 times, servers with issues: 
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)
>  ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:92)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:122)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:252)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1532)
>  ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
>         ... 5 more
> 2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Master server abort: loaded coprocessors are: 
> [org.apache.hadoop.hbase.quotas.MasterQuotasObserver]
> 2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: * ABORTING master hmaster,16000,1712980015693: Unhandled 
> exception. Starting shutdown. *
> java.lang.IllegalStateException: Expected the service 
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
>  ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
>         at 
> org.apache.hadoop.hbase.master.HMaster.initClust

[jira] [Created] (HBASE-28519) HMaster crash when upgrading from HBase-2.5.8 to HBase-3.0.0-beta-1

2024-04-12 Thread Ke Han (Jira)
Ke Han created HBASE-28519:
--

 Summary: HMaster crash when upgrading from HBase-2.5.8 to 
HBase-3.0.0-beta-1
 Key: HBASE-28519
 URL: https://issues.apache.org/jira/browse/HBASE-28519
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.5.8, 3.0.0-beta-1, 3.0.0-alpha-4, 3.0.0-alpha-1
Reporter: Ke Han


h1. Reproduce

Start up HBase-2.5.8 cluster: 4 nodes: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).
 * HDFS is running Hadoop-2.10.2

Execute one read command: LIST, then perform a full-stop upgrade to 
HBase-3.0.0-beta-1 cluster: 1 HM, 2 RS, 1 HDFS (hadoop-2.10.2).

HMaster will abort with the following exception

 
{code:java}
2024-04-13T03:47:15,969 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl 
[FAILED] to be RUNNING, but the service has FAILED
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:613) 
~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.trace.TraceUtil.lambda$tracedRunnable$2(TraceUtil.java:155)
 ~[hbase-common-3.0.0-beta-1.jar:3.0.0-beta-1]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: 
Failed 2 actions: RetriesExhaustedException: 2 times, servers with issues: 
        at 
org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.makeError(BufferedMutatorOverAsyncBufferedMutator.java:107)
 ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.internalFlush(BufferedMutatorOverAsyncBufferedMutator.java:122)
 ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.client.BufferedMutatorOverAsyncBufferedMutator.close(BufferedMutatorOverAsyncBufferedMutator.java:166)
 ~[hbase-client-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.migrateNamespaceTable(TableNamespaceManager.java:92)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:122)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:61)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:252)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1532)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        ... 5 more
2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Master server abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.quotas.MasterQuotasObserver]
2024-04-13T03:47:15,970 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1712980015693: Unhandled 
exception. Starting shutdown. *
java.lang.IllegalStateException: Expected the service ClusterSchemaServiceImpl 
[FAILED] to be RUNNING, but the service has FAILED
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:384)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:324)
 ~[hbase-shaded-miscellaneous-4.1.5.jar:4.1.5]
        at 
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1535)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1204)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2494)
 ~[hbase-server-3.0.0-beta-1.jar:3.0.0-beta-1]
        at 
org.apache.hadoop.hbase.master.HMaster.la

[jira] [Updated] (HBASE-28187) NPE when flushing a non-existing column family

2023-11-05 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28187:
---
Description: 
Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
both shell and the HMaster logs.
h1. Reproduce

Start up HBase 2.5.5 cluster, executing the following commands with hbase shell 
in HMaster node will lead to NPE. (Can be reproduced determinstically)
{code:java}
create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER 
=> 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', BLOOMFILTER 
=> 'ROWCOL'}
incr 'table', 'row1', 'cf1:cell', 2
flush 'table', 'cf3'{code}
The shell outputs
{code:java}
hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
'NONE', BLOOMFILTER => 'ROWCOL'}
Created table table
Took 2.1238 seconds                                                             
                                                                    
=> Hbase::Table - table
hbase:007:0> 
hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
COUNTER VALUE = 2
Took 0.0131 seconds                                                             
                                                                    
hbase:009:0> 
hbase:010:0> flush 'table', 'cf3'
ERROR: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
Caused by: 
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
java.lang.NullPointerException
 at 
org.apache.hadoop.hbase.procedure.flush.RegionServerFlushTableProcedureManager$FlushTableSubprocedurePool.waitForOutstandingTasks(RegionServerFlushTableProcedureManager.java:274)
 at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.flushRegions(FlushTableSubprocedure.java:115)
 at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.acquireBarrier(FlushTableSubprocedure.java:126)
 at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:160)
 at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:750)
For usage try 'help "flush"'
Took 12.1713 seconds                                                         
{code}
 

According to the _flush (flush.rb)_ command specification, user can flush a 
specific column family.
{code:java}
Flush all regions in passed table or pass a region row to
flush an individual region or a region server name whose format
is 'host,port,startcode', to flush all its regions.
You can also flush a single column family for all regions within a table,
or for an specific region only.
For example:
  hbase> flush 'TABLENAME'
  hbase> flush 'TABLENAME','FAMILYNAME' {code}
In the above case, *cf3* an incorrect input (non-existing column family). If 
user tries to flush it, the expected output is:
 # HBase rejects this operation
 # returns a prompt saying the column family doesn't exist {_}"{_}{_}{+}ERROR: 
Unknown CF...{+}".{_}

h1. Root Cause

There's a missing check for the whether the target flushing columnfamily exists.

  was:
Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
both shell and the HMaster logs.
h1. Reproduce

Start up HBase 2.5.5 cluster, executing the following commands with hbase shell 
in HMaster node will lead to NPE. (Can be reproduced determinstically)
{code:java}
create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER 
=> 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', BLOOMFILTER 
=> 'ROWCOL'}
incr 'table', 'row1', 'cf1:cell', 2
flush 'table', 'cf3'{code}
The shell outputs
{code:java}
hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
'NONE', BLOOMFILTER => 'ROWCOL'}
Created table table
Took 2.1238 seconds                                                             
                                                                    
=> Hbase::Table - table
hbase:007:0> 
hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
COUNTER VALUE = 2
Took 0.0131 seconds                                                             
                                                                    
hbase:009:0> 
hbase:010:0> flush 'table', 'cf3'
ERROR: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.ipc.RpcServer.call(

[jira] [Updated] (HBASE-28187) NPE when flushing a non-existing column family

2023-11-05 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28187:
---
Affects Version/s: 2.5.5
   2.4.17

> NPE when flushing a non-existing column family
> --
>
> Key: HBASE-28187
> URL: https://issues.apache.org/jira/browse/HBASE-28187
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.4.17, 2.5.5
>Reporter: Ke Han
>Priority: Major
>
> Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
> both shell and the HMaster logs.
> h1. Reproduce
> Start up HBase 2.5.5 cluster, executing the following commands with hbase 
> shell in HMaster node will lead to NPE. (Can be reproduced determinstically)
> {code:java}
> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
> 'NONE', BLOOMFILTER => 'ROWCOL'}
> incr 'table', 'row1', 'cf1:cell', 2
> flush 'table', 'cf3'{code}
> The shell outputs
> {code:java}
> hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
> 'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
> 'NONE', BLOOMFILTER => 'ROWCOL'}
> Created table table
> Took 2.1238 seconds                                                           
>                                                                       
> => Hbase::Table - table
> hbase:007:0> 
> hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
> COUNTER VALUE = 2
> Took 0.0131 seconds                                                           
>                                                                       
> hbase:009:0> 
> hbase:010:0> flush 'table', 'cf3'
> ERROR: java.io.IOException: java.lang.NullPointerException
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
>  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>  at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
>  at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
> Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.hbase.procedure.flush.RegionServerFlushTableProcedureManager$FlushTableSubprocedurePool.waitForOutstandingTasks(RegionServerFlushTableProcedureManager.java:274)
>  at 
> org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.flushRegions(FlushTableSubprocedure.java:115)
>  at 
> org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.acquireBarrier(FlushTableSubprocedure.java:126)
>  at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:160)
>  at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:46)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:750)
> For usage try 'help "flush"'
> Took 12.1713 seconds                                                         
> {code}
>  
> According to the _flush (flush.rb)_ command specification, user can flush a 
> specific column family.
> {code:java}
> Flush all regions in passed table or pass a region row to
> flush an individual region or a region server name whose format
> is 'host,port,startcode', to flush all its regions.
> You can also flush a single column family for all regions within a table,
> or for an specific region only.
> For example:
>   hbase> flush 'TABLENAME'
>   hbase> flush 'TABLENAME','FAMILYNAME' {code}
> In the above case, *cf3* an incorrect input (non-existing column family). If 
> user tries to flush it, the expected output is:
>  # HBase rejects this operation
>  # returns a prompt saying the column family doesn't exist 
> {_}"{_}{_}{+}ERROR: Unknown CF...{+}".{_}
> h1. Root Cause
> There's a missing check for the whether the target flushing columnfamily 
> exists.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28187) NPE when flushing a non-existing column family

2023-11-05 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28187:
---
Description: 
Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
both shell and the HMaster logs.
h1. Reproduce

Start up HBase 2.5.5 cluster, executing the following commands with hbase shell 
in HMaster node will lead to NPE. (Can be reproduced determinstically)
{code:java}
create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', BLOOMFILTER 
=> 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', BLOOMFILTER 
=> 'ROWCOL'}
incr 'table', 'row1', 'cf1:cell', 2
flush 'table', 'cf3'{code}
The shell outputs
{code:java}
hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
'NONE', BLOOMFILTER => 'ROWCOL'}
Created table table
Took 2.1238 seconds                                                             
                                                                    
=> Hbase::Table - table
hbase:007:0> 
hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
COUNTER VALUE = 2
Took 0.0131 seconds                                                             
                                                                    
hbase:009:0> 
hbase:010:0> flush 'table', 'cf3'
ERROR: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
Caused by: 
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
java.lang.NullPointerException
 at 
org.apache.hadoop.hbase.procedure.flush.RegionServerFlushTableProcedureManager$FlushTableSubprocedurePool.waitForOutstandingTasks(RegionServerFlushTableProcedureManager.java:274)
 at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.flushRegions(FlushTableSubprocedure.java:115)
 at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.acquireBarrier(FlushTableSubprocedure.java:126)
 at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:160)
 at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:750)
For usage try 'help "flush"'
Took 12.1713 seconds                                                         
{code}
 

According to the _flush (flush.rb)_ command specification, user can flush a 
specific column family.
{code:java}
sh all regions in passed table or pass a region row to
flush an individual region or a region server name whose format
is 'host,port,startcode', to flush all its regions.
You can also flush a single column family for all regions within a table,
or for an specific region only.
For example:
  hbase> flush 'TABLENAME'
  hbase> flush 'TABLENAME','FAMILYNAME' {code}
In the above case, *cf3* an incorrect input (non-existing column family). If 
user tries to flush it, the expected output is:
 # HBase rejects this operation
 # returns a prompt saying the column family doesn't exist {_}"{_}{_}{+}ERROR: 
Unknown CF...{+}".{_}

h1. Root Cause

There's a missing check for the whether the target flushing columnfamily exists.

  was:
Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
both shell and the HMaster logs.
h1. Reproduce

Start up HBase 2.5.5 cluster, executing the following commands with hbase shell 
in HMaster node will lead to NPE. (Can be reproduced determinstically)
{code:java}
create 'table7', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL'}
incr 'table7', 'row1', 'cf1:cell', 2
flush 'table7', 'cf3'{code}
The shell outputs
{code:java}
hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
'NONE', BLOOMFILTER => 'ROWCOL'}
Created table table
Took 2.1238 seconds                                                             
                                                                    
=> Hbase::Table - table
hbase:007:0> 
hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
COUNTER VALUE = 2
Took 0.0131 seconds                                                             
                                                                    
hbase:009:0> 
hbase:010:0> flush 'table', 'cf3'
ERROR: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.ipc.RpcServer.call(

[jira] [Updated] (HBASE-28187) NPE when flushing a non-existing column family

2023-11-04 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28187:
---
Description: 
Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
both shell and the HMaster logs.
h1. Reproduce

Start up HBase 2.5.5 cluster, executing the following commands with hbase shell 
in HMaster node will lead to NPE. (Can be reproduced determinstically)
{code:java}
create 'table7', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL'}
incr 'table7', 'row1', 'cf1:cell', 2
flush 'table7', 'cf3'{code}
The shell outputs
{code:java}
hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
'NONE', BLOOMFILTER => 'ROWCOL'}
Created table table
Took 2.1238 seconds                                                             
                                                                    
=> Hbase::Table - table
hbase:007:0> 
hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
COUNTER VALUE = 2
Took 0.0131 seconds                                                             
                                                                    
hbase:009:0> 
hbase:010:0> flush 'table', 'cf3'
ERROR: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
Caused by: 
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
java.lang.NullPointerException
 at 
org.apache.hadoop.hbase.procedure.flush.RegionServerFlushTableProcedureManager$FlushTableSubprocedurePool.waitForOutstandingTasks(RegionServerFlushTableProcedureManager.java:274)
 at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.flushRegions(FlushTableSubprocedure.java:115)
 at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.acquireBarrier(FlushTableSubprocedure.java:126)
 at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:160)
 at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:750)
For usage try 'help "flush"'
Took 12.1713 seconds                                                         
{code}
 

According to the _flush (flush.rb)_ command specification, user can flush a 
specific column family.
{code:java}
sh all regions in passed table or pass a region row to
flush an individual region or a region server name whose format
is 'host,port,startcode', to flush all its regions.
You can also flush a single column family for all regions within a table,
or for an specific region only.
For example:
  hbase> flush 'TABLENAME'
  hbase> flush 'TABLENAME','FAMILYNAME' {code}
In the above case, *cf3* an incorrect input (non-existing column family). If 
user tries to flush it, the expected output is:
 # HBase rejects this operation
 # returns a prompt saying the column family doesn't exist {_}"{_}{_}{+}ERROR: 
Unknown CF...{+}".{_}

h1. Root Cause

There's a missing check for the whether the target flushing columnfamily exists.

  was:
Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
both shell and the HMaster logs.
h1. Reproduce

Start up HBase 2.5.5 cluster, executing the following commands with hbase shell 
in HMaster node will lead to NPE. (Can be reproduced determinstically)
{code:java}
create 'table7', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL'}
incr 'table7', 'row1', 'cf1:cell', 2
flush 'table7', 'cf3'{code}
The shell outputs
{code:java}
hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
'NONE', BLOOMFILTER => 'ROWCOL'}
Created table table
Took 2.1238 seconds                                                             
                                                                    
=> Hbase::Table - table
hbase:007:0> 
hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
COUNTER VALUE = 2
Took 0.0131 seconds                                                             
                                                                    
hbase:009:0> 
hbase:010:0> flush 'table', 'cf3'
ERROR: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.ipc.RpcServer.ca

[jira] [Updated] (HBASE-28187) NPE when flushing a non-existing column family

2023-11-04 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28187:
---
Description: 
Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
both shell and the HMaster logs.
h1. Reproduce

Start up HBase 2.5.5 cluster, executing the following commands with hbase shell 
in HMaster node will lead to NPE. (Can be reproduced determinstically)
{code:java}
create 'table7', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL'}
incr 'table7', 'row1', 'cf1:cell', 2
flush 'table7', 'cf3'{code}
The shell outputs
{code:java}
hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
'NONE', BLOOMFILTER => 'ROWCOL'}
Created table table
Took 2.1238 seconds                                                             
                                                                    
=> Hbase::Table - table
hbase:007:0> 
hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
COUNTER VALUE = 2
Took 0.0131 seconds                                                             
                                                                    
hbase:009:0> 
hbase:010:0> flush 'table', 'cf3'
ERROR: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
Caused by: 
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
java.lang.NullPointerException
 at 
org.apache.hadoop.hbase.procedure.flush.RegionServerFlushTableProcedureManager$FlushTableSubprocedurePool.waitForOutstandingTasks(RegionServerFlushTableProcedureManager.java:274)
 at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.flushRegions(FlushTableSubprocedure.java:115)
 at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.acquireBarrier(FlushTableSubprocedure.java:126)
 at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:160)
 at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:750)
For usage try 'help "flush"'
Took 12.1713 seconds                                                         
{code}
 

According to the _flush (flush.rb)_ command specification, user can flush a 
specific column family.
{code:java}
sh all regions in passed table or pass a region row to
flush an individual region or a region server name whose format
is 'host,port,startcode', to flush all its regions.
You can also flush a single column family for all regions within a table,
or for an specific region only.
For example:
  hbase> flush 'TABLENAME'
  hbase> flush 'TABLENAME','FAMILYNAME' {code}
In the above case, *cf3* an incorrect input (non-existing column family). If 
user tries to flush it, the expected output is:
 # HBase rejects this operation
 # returns a prompt saying the column family doesn't exist {_}"{_}{_}ERROR: 
Unknown CF...".{_}

h1. Root Cause

There's a missing check for the whether the target flushing columnfamily exists.

  was:
Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
both shell and the HMaster logs.
h1. Reproduce

Start up HBase 2.5.5 cluster, executing the following commands with hbase shell 
in HMaster node will lead to NPE.

 
{code:java}
create 'table7', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL'}
incr 'table7', 'row1', 'cf1:cell', 2
flush 'table7', 'cf3'{code}
 

 

The shell outputs

 
{code:java}
hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
'NONE', BLOOMFILTER => 'ROWCOL'}
Created table table
Took 2.1238 seconds                                                             
                                                                    
=> Hbase::Table - table
hbase:007:0> 
hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
COUNTER VALUE = 2
Took 0.0131 seconds                                                             
                                                                    
hbase:009:0> 
hbase:010:0> flush 'table', 'cf3'
ERROR: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
 at org.

[jira] [Updated] (HBASE-28187) NPE when flushing a non-existing column family

2023-11-04 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28187:
---
Summary: NPE when flushing a non-existing column family  (was: NPE when 
flush a non-existing column family)

> NPE when flushing a non-existing column family
> --
>
> Key: HBASE-28187
> URL: https://issues.apache.org/jira/browse/HBASE-28187
> Project: HBase
>  Issue Type: Bug
>Reporter: Ke Han
>Priority: Major
>
> Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
> both shell and the HMaster logs.
> h1. Reproduce
> Start up HBase 2.5.5 cluster, executing the following commands with hbase 
> shell in HMaster node will lead to NPE.
>  
> {code:java}
> create 'table7', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
> 'NONE', BLOOMFILTER => 'ROWCOL'}
> incr 'table7', 'row1', 'cf1:cell', 2
> flush 'table7', 'cf3'{code}
>  
>  
> The shell outputs
>  
> {code:java}
> hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
> 'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
> 'NONE', BLOOMFILTER => 'ROWCOL'}
> Created table table
> Took 2.1238 seconds                                                           
>                                                                       
> => Hbase::Table - table
> hbase:007:0> 
> hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
> COUNTER VALUE = 2
> Took 0.0131 seconds                                                           
>                                                                       
> hbase:009:0> 
> hbase:010:0> flush 'table', 'cf3'
> ERROR: java.io.IOException: java.lang.NullPointerException
>  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
>  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
>  at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
>  at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
> Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.hbase.procedure.flush.RegionServerFlushTableProcedureManager$FlushTableSubprocedurePool.waitForOutstandingTasks(RegionServerFlushTableProcedureManager.java:274)
>  at 
> org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.flushRegions(FlushTableSubprocedure.java:115)
>  at 
> org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.acquireBarrier(FlushTableSubprocedure.java:126)
>  at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:160)
>  at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:46)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:750)
> For usage try 'help "flush"'
> Took 12.1713 seconds                                                          
>                                                                               
>                                                                
> hbase:011:0>  {code}
>  
>  
> According to the flush command specification, user could flush a specific 
> column family.
>  
> {code:java}
> sh all regions in passed table or pass a region row to
> flush an individual region or a region server name whose format
> is 'host,port,startcode', to flush all its regions.
> You can also flush a single column family for all regions within a table,
> or for an specific region only.
> For example:
>   hbase> flush 'TABLENAME'
>   hbase> flush 'TABLENAME','FAMILYNAME' {code}
>  
> In the above case, *cf3* an incorrect input (non-existing column family). If 
> user tries to flush it, the expected output is:
>  # HBase rejects this operation
>  # returns a prompt saying the column family doesn't exist {_}"{_}_ERROR: 
> Unknown CF..."._
>  
> It can be reproduced deterministically with the above commands. 
> h1. Root Cause
> There's a missing check for the whether the target flushing columnfamily 
> exists.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28187) NPE when flush a non-existing column family

2023-11-04 Thread Ke Han (Jira)
Ke Han created HBASE-28187:
--

 Summary: NPE when flush a non-existing column family
 Key: HBASE-28187
 URL: https://issues.apache.org/jira/browse/HBASE-28187
 Project: HBase
  Issue Type: Bug
Reporter: Ke Han


Flush a columnfamily that doesn't exist in the table will cause NPE ERROR in 
both shell and the HMaster logs.
h1. Reproduce

Start up HBase 2.5.5 cluster, executing the following commands with hbase shell 
in HMaster node will lead to NPE.

 
{code:java}
create 'table7', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL'}
incr 'table7', 'row1', 'cf1:cell', 2
flush 'table7', 'cf3'{code}
 

 

The shell outputs

 
{code:java}
hbase:006:0> create 'table', {NAME => 'cf1', VERSIONS => 2, COMPRESSION => 
'GZ', BLOOMFILTER => 'ROWCOL'}, {NAME => 'cf2', VERSIONS => 4, COMPRESSION => 
'NONE', BLOOMFILTER => 'ROWCOL'}
Created table table
Took 2.1238 seconds                                                             
                                                                    
=> Hbase::Table - table
hbase:007:0> 
hbase:008:0> incr 'table', 'row1', 'cf1:cell', 2
COUNTER VALUE = 2
Took 0.0131 seconds                                                             
                                                                    
hbase:009:0> 
hbase:010:0> flush 'table', 'cf3'
ERROR: java.io.IOException: java.lang.NullPointerException
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
 at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
Caused by: 
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
java.lang.NullPointerException
 at 
org.apache.hadoop.hbase.procedure.flush.RegionServerFlushTableProcedureManager$FlushTableSubprocedurePool.waitForOutstandingTasks(RegionServerFlushTableProcedureManager.java:274)
 at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.flushRegions(FlushTableSubprocedure.java:115)
 at 
org.apache.hadoop.hbase.procedure.flush.FlushTableSubprocedure.acquireBarrier(FlushTableSubprocedure.java:126)
 at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:160)
 at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:46)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:750)
For usage try 'help "flush"'
Took 12.1713 seconds                                                            
                                                                                
                                                           
hbase:011:0>  {code}
 

 

According to the flush command specification, user could flush a specific 
column family.

 
{code:java}
sh all regions in passed table or pass a region row to
flush an individual region or a region server name whose format
is 'host,port,startcode', to flush all its regions.
You can also flush a single column family for all regions within a table,
or for an specific region only.
For example:
  hbase> flush 'TABLENAME'
  hbase> flush 'TABLENAME','FAMILYNAME' {code}
 

In the above case, *cf3* an incorrect input (non-existing column family). If 
user tries to flush it, the expected output is:
 # HBase rejects this operation
 # returns a prompt saying the column family doesn't exist {_}"{_}_ERROR: 
Unknown CF..."._

 

It can be reproduced deterministically with the above commands. 
h1. Root Cause

There's a missing check for the whether the target flushing columnfamily exists.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28167) HMaster crashes due to NPE at AsyncFSWAL.closeWriter

2023-10-19 Thread Ke Han (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1272#comment-1272
 ] 

Ke Han commented on HBASE-28167:


Thanks for the reply!

[~subrat.mishra] Unfortunately I didn't record the NN and DN logs properly in 
the previous buggy run. I'll try to reproduce it again with hdfs logging to 
provide more information.

> HMaster crashes due to NPE at AsyncFSWAL.closeWriter
> 
>
> Key: HBASE-28167
> URL: https://issues.apache.org/jira/browse/HBASE-28167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.4.17
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-a82083cf5d18.log, persistent.tar.gz
>
>
> I am testing the upgrade process of HMaster, when starting up the new version 
> HMaster 2.4.17, it crashed immediately with the following exception.
> {code:java}
> 2023-10-17 21:03:35,892 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: * ABORTING master hmaster,16000,1697576576301: Unhandled 
> exception. Starting shutdown. *
> org.apache.hadoop.hbase.FailedCloseWALAfterInitializedErrorException: Failed 
> close after init wal failed.
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:167)
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:62)
>         at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:295)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.createWAL(MasterRegion.java:200)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.bootstrap(MasterRegion.java:220)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:348)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:855)
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2193)
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
>         at java.lang.Thread.run(Thread.java:750)
> Caused by: java.io.IOException: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:979)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.close(AbstractFSWAL.java:1006)
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:165)
>         ... 10 more
> Caused by: java.lang.NullPointerException
>         at 
> java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
>         at 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.closeWriter(AsyncFSWAL.java:743)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doShutdown(AsyncFSWAL.java:800)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$2.call(AbstractFSWAL.java:951)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$2.call(AbstractFSWAL.java:946)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         ... 1 more {code}
> h1. Reproduce
> It happens non-deterministically, around 2 out of 1802 tests. It might 
> require an exception when HMaster interacts with the HDFS cluster since I 
> noticed the following warning before the NPE exception
> {code:java}
> 2023-10-17 21:03:35,857 WARN  [master/hmaster:16000:becomeActiveMaster] 
> asyncfs.FanOutOneBlockAsyncDFSOutputHelper: create fan-out dfs output 
> /hbase/MasterData/WALs/hmaster,16000,1697576576301/hmaster%2C16000%2C1697576576301.1697576615700
>  failed, retry = 0
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /hbase/MasterData/WALs/hmaster,16000,1697576576301/hmaster%2C16000%2C1697576576301.1697576615700
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1832)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2586)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServ

[jira] [Comment Edited] (HBASE-28167) HMaster crashes due to NPE at AsyncFSWAL.closeWriter

2023-10-19 Thread Ke Han (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1272#comment-1272
 ] 

Ke Han edited comment on HBASE-28167 at 10/19/23 1:30 PM:
--

Thanks for the reply!

[~subrat.mishra] Unfortunately I didn't record the NN and DN logs properly in 
the previous buggy run. I'll try to reproduce it with hdfs logging to provide 
more information.


was (Author: JIRAUSER289562):
Thanks for the reply!

[~subrat.mishra] Unfortunately I didn't record the NN and DN logs properly in 
the previous buggy run. I'll try to reproduce it again with hdfs logging to 
provide more information.

> HMaster crashes due to NPE at AsyncFSWAL.closeWriter
> 
>
> Key: HBASE-28167
> URL: https://issues.apache.org/jira/browse/HBASE-28167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.4.17
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-a82083cf5d18.log, persistent.tar.gz
>
>
> I am testing the upgrade process of HMaster, when starting up the new version 
> HMaster 2.4.17, it crashed immediately with the following exception.
> {code:java}
> 2023-10-17 21:03:35,892 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: * ABORTING master hmaster,16000,1697576576301: Unhandled 
> exception. Starting shutdown. *
> org.apache.hadoop.hbase.FailedCloseWALAfterInitializedErrorException: Failed 
> close after init wal failed.
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:167)
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:62)
>         at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:295)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.createWAL(MasterRegion.java:200)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.bootstrap(MasterRegion.java:220)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:348)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:855)
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2193)
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
>         at java.lang.Thread.run(Thread.java:750)
> Caused by: java.io.IOException: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:979)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.close(AbstractFSWAL.java:1006)
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:165)
>         ... 10 more
> Caused by: java.lang.NullPointerException
>         at 
> java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
>         at 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.closeWriter(AsyncFSWAL.java:743)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doShutdown(AsyncFSWAL.java:800)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$2.call(AbstractFSWAL.java:951)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$2.call(AbstractFSWAL.java:946)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         ... 1 more {code}
> h1. Reproduce
> It happens non-deterministically, around 2 out of 1802 tests. It might 
> require an exception when HMaster interacts with the HDFS cluster since I 
> noticed the following warning before the NPE exception
> {code:java}
> 2023-10-17 21:03:35,857 WARN  [master/hmaster:16000:becomeActiveMaster] 
> asyncfs.FanOutOneBlockAsyncDFSOutputHelper: create fan-out dfs output 
> /hbase/MasterData/WALs/hmaster,16000,1697576576301/hmaster%2C16000%2C1697576576301.1697576615700
>  failed, retry = 0
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /hbase/MasterData/WALs/hmaster,16000,1697576576301/hmaster%2C16000%2C1697576576301.1697576615700
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1832)
>         at 
> org.ap

[jira] [Updated] (HBASE-28167) HMaster crashes due to NPE at AsyncFSWAL.closeWriter

2023-10-19 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28167:
---
Description: 
I am testing the upgrade process of HMaster, when starting up the new version 
HMaster 2.4.17, it crashed immediately with the following exception.
{code:java}
2023-10-17 21:03:35,892 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1697576576301: Unhandled 
exception. Starting shutdown. *
org.apache.hadoop.hbase.FailedCloseWALAfterInitializedErrorException: Failed 
close after init wal failed.
        at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:167)
        at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:62)
        at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:295)
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.createWAL(MasterRegion.java:200)
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.bootstrap(MasterRegion.java:220)
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:348)
        at 
org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:855)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2193)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:979)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.close(AbstractFSWAL.java:1006)
        at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:165)
        ... 10 more
Caused by: java.lang.NullPointerException
        at 
java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
        at 
java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.closeWriter(AsyncFSWAL.java:743)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doShutdown(AsyncFSWAL.java:800)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$2.call(AbstractFSWAL.java:951)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$2.call(AbstractFSWAL.java:946)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more {code}
h1. Reproduce

It happens non-deterministically, around 2 out of 1802 tests. It might require 
an exception when HMaster interacts with the HDFS cluster since I noticed the 
following warning before the NPE exception
{code:java}
2023-10-17 21:03:35,857 WARN  [master/hmaster:16000:becomeActiveMaster] 
asyncfs.FanOutOneBlockAsyncDFSOutputHelper: create fan-out dfs output 
/hbase/MasterData/WALs/hmaster,16000,1697576576301/hmaster%2C16000%2C1697576576301.1697576615700
 failed, retry = 0
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/hbase/MasterData/WALs/hmaster,16000,1697576576301/hmaster%2C16000%2C1697576576301.1697576615700
 could only be replicated to 0 nodes instead of minReplication (=1).  There are 
1 datanode(s) running and no node(s) are excluded in this operation.
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1832)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2586)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:889)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:517)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)
 {code}
This remote excpetion might lead to a NULL writer and it further causes NPE to 
crash the HMaster.
h1. Root Cause

When invoking *inflightWALClosures.put

[jira] [Updated] (HBASE-28167) HMaster crashes due to NPE at AsyncFSWAL.closeWriter

2023-10-18 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28167:
---
Description: 
I am testing the upgrade process of HMaster, when starting up the new version 
HMaster 2.4.17, it crashed immediately with the following exception.
{code:java}
2023-10-17 21:03:35,892 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1697576576301: Unhandled 
exception. Starting shutdown. *
org.apache.hadoop.hbase.FailedCloseWALAfterInitializedErrorException: Failed 
close after init wal failed.
        at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:167)
        at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:62)
        at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:295)
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.createWAL(MasterRegion.java:200)
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.bootstrap(MasterRegion.java:220)
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:348)
        at 
org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:855)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2193)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:979)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.close(AbstractFSWAL.java:1006)
        at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:165)
        ... 10 more
Caused by: java.lang.NullPointerException
        at 
java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
        at 
java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.closeWriter(AsyncFSWAL.java:743)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doShutdown(AsyncFSWAL.java:800)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$2.call(AbstractFSWAL.java:951)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$2.call(AbstractFSWAL.java:946)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more {code}
h1. Reproduce

It happens non-deterministically, around (2 out of 1802 tests). It might 
require a exception when HMaster interacts with the HDFS cluster since I 
noticed the following warning before the NPE exception
{code:java}
2023-10-17 21:03:35,857 WARN  [master/hmaster:16000:becomeActiveMaster] 
asyncfs.FanOutOneBlockAsyncDFSOutputHelper: create fan-out dfs output 
/hbase/MasterData/WALs/hmaster,16000,1697576576301/hmaster%2C16000%2C1697576576301.1697576615700
 failed, retry = 0
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/hbase/MasterData/WALs/hmaster,16000,1697576576301/hmaster%2C16000%2C1697576576301.1697576615700
 could only be replicated to 0 nodes instead of minReplication (=1).  There are 
1 datanode(s) running and no node(s) are excluded in this operation.
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1832)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2586)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:889)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:517)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)
 {code}
This remote excpetion might lead to a NULL writer and it further causes NPE to 
crash athe HMaster.
h1. Root Cause

When invoking *inflightWALClosures.p

[jira] [Updated] (HBASE-28167) HMaster crashes due to NPE at AsyncFSWAL.closeWriter

2023-10-18 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28167:
---
Summary: HMaster crashes due to NPE at AsyncFSWAL.closeWriter  (was: 
HMaster crash due to NPE)

> HMaster crashes due to NPE at AsyncFSWAL.closeWriter
> 
>
> Key: HBASE-28167
> URL: https://issues.apache.org/jira/browse/HBASE-28167
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 2.4.17
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-a82083cf5d18.log, persistent.tar.gz
>
>
> I am testing the upgrade process of HMaster, when starting up the new version 
> HMaster 2.4.17, it crashed immediately with the following exception.
> {code:java}
> 2023-10-17 21:03:35,892 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: * ABORTING master hmaster,16000,1697576576301: Unhandled 
> exception. Starting shutdown. *
> org.apache.hadoop.hbase.FailedCloseWALAfterInitializedErrorException: Failed 
> close after init wal failed.
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:167)
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:62)
>         at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:295)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.createWAL(MasterRegion.java:200)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.bootstrap(MasterRegion.java:220)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:348)
>         at 
> org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:855)
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2193)
>         at 
> org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
>         at java.lang.Thread.run(Thread.java:750)
> Caused by: java.io.IOException: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:979)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.close(AbstractFSWAL.java:1006)
>         at 
> org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:165)
>         ... 10 more
> Caused by: java.lang.NullPointerException
>         at 
> java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
>         at 
> java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.closeWriter(AsyncFSWAL.java:743)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doShutdown(AsyncFSWAL.java:800)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$2.call(AbstractFSWAL.java:951)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$2.call(AbstractFSWAL.java:946)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         ... 1 more {code}
> h1. Reproduce
> It happens non-deterministically, around (2 out of 1802 tests). It might 
> require a exception when HMaster interacts with the HDFS cluster since I 
> noticed the following warning before the NPE exception
>  
> {code:java}
> 2023-10-17 21:03:35,857 WARN  [master/hmaster:16000:becomeActiveMaster] 
> asyncfs.FanOutOneBlockAsyncDFSOutputHelper: create fan-out dfs output 
> /hbase/MasterData/WALs/hmaster,16000,1697576576301/hmaster%2C16000%2C1697576576301.1697576615700
>  failed, retry = 0
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /hbase/MasterData/WALs/hmaster,16000,1697576576301/hmaster%2C16000%2C1697576576301.1697576615700
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and no node(s) are excluded in this operation.
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1832)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2586)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:889)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenode

[jira] [Created] (HBASE-28167) HMaster crash due to NPE

2023-10-18 Thread Ke Han (Jira)
Ke Han created HBASE-28167:
--

 Summary: HMaster crash due to NPE
 Key: HBASE-28167
 URL: https://issues.apache.org/jira/browse/HBASE-28167
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.4.17
Reporter: Ke Han
 Attachments: hbase--master-a82083cf5d18.log, persistent.tar.gz

I am testing the upgrade process of HMaster, when starting up the new version 
HMaster 2.4.17, it crashed immediately with the following exception.
{code:java}
2023-10-17 21:03:35,892 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: * ABORTING master hmaster,16000,1697576576301: Unhandled 
exception. Starting shutdown. *
org.apache.hadoop.hbase.FailedCloseWALAfterInitializedErrorException: Failed 
close after init wal failed.
        at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:167)
        at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:62)
        at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:295)
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.createWAL(MasterRegion.java:200)
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.bootstrap(MasterRegion.java:220)
        at 
org.apache.hadoop.hbase.master.region.MasterRegion.create(MasterRegion.java:348)
        at 
org.apache.hadoop.hbase.master.region.MasterRegionFactory.create(MasterRegionFactory.java:104)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:855)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2193)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.shutdown(AbstractFSWAL.java:979)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.close(AbstractFSWAL.java:1006)
        at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:165)
        ... 10 more
Caused by: java.lang.NullPointerException
        at 
java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
        at 
java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.closeWriter(AsyncFSWAL.java:743)
        at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.doShutdown(AsyncFSWAL.java:800)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$2.call(AbstractFSWAL.java:951)
        at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL$2.call(AbstractFSWAL.java:946)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more {code}
h1. Reproduce

It happens non-deterministically, around (2 out of 1802 tests). It might 
require a exception when HMaster interacts with the HDFS cluster since I 
noticed the following warning before the NPE exception

 
{code:java}
2023-10-17 21:03:35,857 WARN  [master/hmaster:16000:becomeActiveMaster] 
asyncfs.FanOutOneBlockAsyncDFSOutputHelper: create fan-out dfs output 
/hbase/MasterData/WALs/hmaster,16000,1697576576301/hmaster%2C16000%2C1697576576301.1697576615700
 failed, retry = 0
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/hbase/MasterData/WALs/hmaster,16000,1697576576301/hmaster%2C16000%2C1697576576301.1697576615700
 could only be replicated to 0 nodes instead of minReplication (=1).  There are 
1 datanode(s) running and no node(s) are excluded in this operation.
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1832)
        at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2586)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:889)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:517)
        at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:498)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1038)
    

[jira] [Updated] (HBASE-28159) Unable to get table state error when table is being initialized

2023-10-17 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28159:
---
Description: 
When executing commands to create a table, I noticed the following ERROR in 
HMaster
{code:java}
2023-10-17 06:41:47,118 ERROR [master/hmaster:16000.Chore.1] 
master.TableStateManager: Unable to get table 
uuidf68fb89ec7f4435597d69fb7b099d8e7 state
org.apache.hadoop.hbase.TableNotFoundException: No state found for 
uuidf68fb89ec7f4435597d69fb7b099d8e7
        at 
org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:155)
        at 
org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:92)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:419)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.getRegionStatesCount(AssignmentManager.java:2341)
        at 
org.apache.hadoop.hbase.master.HMaster.getClusterMetricsWithoutCoprocessor(HMaster.java:2616)
        at 
org.apache.hadoop.hbase.master.HMaster.getClusterMetricsWithoutCoprocessor(HMaster.java:2537)
        at 
org.apache.hadoop.hbase.master.balancer.ClusterStatusChore.chore(ClusterStatusChore.java:47)
        at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}
h1. Reproduce

Due to the thread interleaving, it might need to run the following command 
sequence multiple times to reproduce

1 HM, 2 RS, HDFS 2.10.2 cluster
{code:java}
create 'uuid49bb410e0a0c40ffb070d17787b4cad7', {NAME => 
'uuid66e57e5195e04956a78f789b2a25ec01', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid119181eed72a43ccb66fabe37f84d2c0', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuidc2d4931eaf4c429db0e55514fb12e767', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuidc9802bbfbe434411ae68bb8388d499b6', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuidc85e117d0ca144719fc53d30b189a343', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}
create 'uuid094dd5bf47eb47d69148b63e73ce0e7c', {NAME => 
'uuid76ccbd96fbdc418b95ed9971ff423b2d', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 
'uuid36835d3faff04838bd02d6226557d7c8', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuid37752598d1bb405eb39a3e17c04d7e60', VERSIONS => 1, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}
create 'uuidf68fb89ec7f4435597d69fb7b099d8e7', {NAME => 
'uuidb235288b1d304fe1a62adb63968d9eee', VERSIONS => 1, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
'uuidf348f8849e724b3fa231fc2bb459be2d', VERSIONS => 1, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuid81341a87083e49d7a0d8aff7b1ccf16a', VERSIONS => 3, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuid24db0d3c67c347d3a4c18af90facec2d', VERSIONS => 1, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 
'uuid7ecf10315f444cfd9c5698695f9054d9', VERSIONS => 1, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
enable 'uuid094dd5bf47eb47d69148b63e73ce0e7c'
create_namespace 'uuidc1066f82d7834f698d335dd04fa7ad3e'
alter 'uuid094dd5bf47eb47d69148b63e73ce0e7c', {NAME => 'enaJvIGYBk', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => false}
disable 'uuidf68fb89ec7f4435597d69fb7b099d8e7' {code}
I have attached the full logs.
h1. Root Cause

The ERROR message is thrown because of the thread interleaving between (1) T1: 
creating the table and (2) T2: Chore thread calculating TABLE_TO_REGIONS_COUNT.

Here's how it happens in detail
 # User issues a create table request, it puts the table name into 
tableDescriptors.
 # Chore thread is trying to calculate TABLE_TO_REGIONS_COUNT by iterating all 
tables from {*}getTableDescriptors().getAll(){*}. This also in

[jira] [Created] (HBASE-28159) Unable to get table state error when table is being initialized

2023-10-17 Thread Ke Han (Jira)
Ke Han created HBASE-28159:
--

 Summary: Unable to get table state error when table is being 
initialized
 Key: HBASE-28159
 URL: https://issues.apache.org/jira/browse/HBASE-28159
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 2.4.17
Reporter: Ke Han
 Attachments: hbase--master-37bbb9b6f05a.log, persistent.tar.gz

When executing commands to create a table, I noticed the following ERROR in 
HMaster
{code:java}
2023-10-17 06:41:47,118 ERROR [master/hmaster:16000.Chore.1] 
master.TableStateManager: Unable to get table 
uuidf68fb89ec7f4435597d69fb7b099d8e7 state
org.apache.hadoop.hbase.TableNotFoundException: No state found for 
uuidf68fb89ec7f4435597d69fb7b099d8e7
        at 
org.apache.hadoop.hbase.master.TableStateManager.getTableState(TableStateManager.java:155)
        at 
org.apache.hadoop.hbase.master.TableStateManager.isTableState(TableStateManager.java:92)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.isTableDisabled(AssignmentManager.java:419)
        at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.getRegionStatesCount(AssignmentManager.java:2341)
        at 
org.apache.hadoop.hbase.master.HMaster.getClusterMetricsWithoutCoprocessor(HMaster.java:2616)
        at 
org.apache.hadoop.hbase.master.HMaster.getClusterMetricsWithoutCoprocessor(HMaster.java:2537)
        at 
org.apache.hadoop.hbase.master.balancer.ClusterStatusChore.chore(ClusterStatusChore.java:47)
        at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}
h1. Reproduce

Due to the thread interleaving, it might need to run the following command 
sequence multiple times to reproduce

1 HM, 2 RS, HDFS-2.10.2
{code:java}
create 'uuid49bb410e0a0c40ffb070d17787b4cad7', {NAME => 
'uuid66e57e5195e04956a78f789b2a25ec01', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid119181eed72a43ccb66fabe37f84d2c0', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuidc2d4931eaf4c429db0e55514fb12e767', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuidc9802bbfbe434411ae68bb8388d499b6', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuidc85e117d0ca144719fc53d30b189a343', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}
create 'uuid094dd5bf47eb47d69148b63e73ce0e7c', {NAME => 
'uuid76ccbd96fbdc418b95ed9971ff423b2d', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 
'uuid36835d3faff04838bd02d6226557d7c8', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuid37752598d1bb405eb39a3e17c04d7e60', VERSIONS => 1, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}
create 'uuidf68fb89ec7f4435597d69fb7b099d8e7', {NAME => 
'uuidb235288b1d304fe1a62adb63968d9eee', VERSIONS => 1, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
'uuidf348f8849e724b3fa231fc2bb459be2d', VERSIONS => 1, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuid81341a87083e49d7a0d8aff7b1ccf16a', VERSIONS => 3, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuid24db0d3c67c347d3a4c18af90facec2d', VERSIONS => 1, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 
'uuid7ecf10315f444cfd9c5698695f9054d9', VERSIONS => 1, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
enable 'uuid094dd5bf47eb47d69148b63e73ce0e7c'
create_namespace 'uuidc1066f82d7834f698d335dd04fa7ad3e'
alter 'uuid094dd5bf47eb47d69148b63e73ce0e7c', {NAME => 'enaJvIGYBk', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => false}
disable 'uuidf68fb89ec7f4435597d69fb7b099d8e7' {code}
I have attached the full logs.
h1. Root Cause

The ERROR message is thrown because of the thread interleaving between (1) T1: 
creating the table and (2) T2: Chore thread calculating TABLE_TO_REGIONS_COUNT.

Here'

[jira] [Updated] (HBASE-28125) list_quota_table_sizes returns a table whose quota is not set

2023-10-06 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28125:
---
Description: 
When using HBase cluster 2.4.17, I noticed that the list_quota_table_sizes 
sometimes could return a table whose quota is not set.
h1. Reproduce

{_}This bug cannot be reproduced deterministically{_}. The probability for its 
manifestation is 1.1% (Keep executing the commands, and it occurs two times out 
of 178 repeatedly executions).

Start up HBase 2.4.17 cluster (2.10.2 HDFS, 1 HMaster, 2 RS)

Executing the following commands,
{code:java}
create_namespace 'uuidef8e6005b9e74092927a4b335424f7c5'
create 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', {NAME => 
'uuid07f904a09baf414d903b2818d3091f28', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuida60aaa69834e4f0596b5b3b3c12b2cb8', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid085c0991aa1c4d4ea145767e7e7bf60c', VERSIONS => 4, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuid1d9a2bc405c64708b9e471ae14794741', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}
create 'uuid5cafa12ce5034015bb597428b294a40d', {NAME => 
'uuid7d9efb39ac94472b90dc60ed3723cdf9', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuidde273134b6434fc584990554cfa64b10', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}
clone_table_schema 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', 
'uuidb8e2393af3314726890b70ef5871a9d0'
drop 'uuid5cafa12ce5034015bb597428b294a40d'
truncate 'uuidb8e2393af3314726890b70ef5871a9d0'
compaction_state 'uuid80d6aa3495094cd7b4018d3ab3fe9db8'
truncate_preserve 'uuidb8e2393af3314726890b70ef5871a9d0'
drop 'uuidb8e2393af3314726890b70ef5871a9d0'
truncate 'uuid80d6aa3495094cd7b4018d3ab3fe9db8'
alter 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', {NAME => 
'uuida60aaa69834e4f0596b5b3b3c12b2cb8', METHOD => 'delete'}
disable 'uuid80d6aa3495094cd7b4018d3ab3fe9db8'
incr 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', 
'uuid863efa8e4f1f44888af0ed139effba33', 
'uuid085c0991aa1c4d4ea145767e7e7bf60c:NONE', 3
drop 'uuid80d6aa3495094cd7b4018d3ab3fe9db8'
wal_roll 'hregion2,16020'
wal_roll 'hregion1,16020'
create 'uuid4323f716aea24b5fa001f0722cdc66f9', {NAME => 
'uuidd64032ff2e7340fb8832a16430fc14c1', VERSIONS => 3, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
'uuidc4be1958501543ac86661793a4c144cb', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 
'uuid0c7961e7f67a464387f9f5ce428f08d1', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuida9f7a62b76fa4560834cc5789d5abf3a', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}
major_compact 'uuid4323f716aea24b5fa001f0722cdc66f9', 
'uuid0c7961e7f67a464387f9f5ce428f08d1'
incr 'uuid4323f716aea24b5fa001f0722cdc66f9', 
'uuidffb29730a2fb4f5f8c18f0c1bc254402', 
'uuida9f7a62b76fa4560834cc5789d5abf3a:cc', 3
alter 'uuid4323f716aea24b5fa001f0722cdc66f9', {NAME => 
'uuida9f7a62b76fa4560834cc5789d5abf3a', METHOD => 'delete'}
update_config 'hregion1,16020' {code}
Then execute read command in hbase shell
{code:java}
list_quota_table_sizes

TABLE  SIZE
uuid4323f716aea24b5fa001f0722cdc66f9 5133
1 row(s)
Took 0.0278 seconds {code}
uuid4323f716aea24b5fa001f0722cdc66f9 is not set in the quota table but it still 
appears in in list_quota_table_sizes results also with a strange value: 5133. 
h1. Thoughts

The root cause might be related to how *quota* table reacts when a new table is 
created. I am still investigating the root cause (Injecting logs to 
MasterQuotaManager to understand why it also records this table).

  was:
When using HBase cluster 2.4.17, I noticed that the list_quota_table_sizes 
sometimes could return a table whose quota is not set.
h1. Reproduce

{_}This bug cannot be reproduced deterministically{_}. The probability for its 
manifestation is 1.1% (Keep executing the commands, and it occurs two times out 
of 178 repeatedly executions).

Start up HBase 2.4.17 cluster (2.10.2 HDFS, 1 HMaster, 2 RS)

Executing the following commands,
{code:java}
create_namespace 'uuidef8e6005b9e74092927a4b335424f7c5'
create 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', {NAME => 
'uuid07f904a09baf414d903b2818d3091f28', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuida60aaa69834e4f0596b5b3b3c12b2cb8', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid085c0991aa1c4d4ea145767e7e7bf60c', VERSIONS => 4, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuid1d9a2bc405c64708b9e471ae14794741', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'f

[jira] [Updated] (HBASE-28125) list_quota_table_sizes returns a table whose quota is not set

2023-10-01 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28125:
---
Description: 
When using HBase cluster 2.4.17, I noticed that the list_quota_table_sizes 
sometimes could return a table whose quota is not set.
h1. Reproduce

{_}This bug cannot be reproduced deterministically{_}. The probability for its 
manifestation is 1.1% (Keep executing the commands, and it occurs two times out 
of 178 repeatedly executions).

Start up HBase 2.4.17 cluster (2.10.2 HDFS, 1 HMaster, 2 RS)

Executing the following commands,
{code:java}
create_namespace 'uuidef8e6005b9e74092927a4b335424f7c5'
create 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', {NAME => 
'uuid07f904a09baf414d903b2818d3091f28', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuida60aaa69834e4f0596b5b3b3c12b2cb8', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid085c0991aa1c4d4ea145767e7e7bf60c', VERSIONS => 4, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuid1d9a2bc405c64708b9e471ae14794741', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}
create 'uuid5cafa12ce5034015bb597428b294a40d', {NAME => 
'uuid7d9efb39ac94472b90dc60ed3723cdf9', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuidde273134b6434fc584990554cfa64b10', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}
clone_table_schema 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', 
'uuidb8e2393af3314726890b70ef5871a9d0'
drop 'uuid5cafa12ce5034015bb597428b294a40d'
truncate 'uuidb8e2393af3314726890b70ef5871a9d0'
compaction_state 'uuid80d6aa3495094cd7b4018d3ab3fe9db8'
truncate_preserve 'uuidb8e2393af3314726890b70ef5871a9d0'
drop 'uuidb8e2393af3314726890b70ef5871a9d0'
truncate 'uuid80d6aa3495094cd7b4018d3ab3fe9db8'
alter 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', {NAME => 
'uuida60aaa69834e4f0596b5b3b3c12b2cb8', METHOD => 'delete'}
disable 'uuid80d6aa3495094cd7b4018d3ab3fe9db8'
incr 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', 
'uuid863efa8e4f1f44888af0ed139effba33', 
'uuid085c0991aa1c4d4ea145767e7e7bf60c:NONE', 3
drop 'uuid80d6aa3495094cd7b4018d3ab3fe9db8'
wal_roll 'hregion2,16020'
wal_roll 'hregion1,16020'
create 'uuid4323f716aea24b5fa001f0722cdc66f9', {NAME => 
'uuidd64032ff2e7340fb8832a16430fc14c1', VERSIONS => 3, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
'uuidc4be1958501543ac86661793a4c144cb', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 
'uuid0c7961e7f67a464387f9f5ce428f08d1', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuida9f7a62b76fa4560834cc5789d5abf3a', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}
major_compact 'uuid4323f716aea24b5fa001f0722cdc66f9', 
'uuid0c7961e7f67a464387f9f5ce428f08d1'
incr 'uuid4323f716aea24b5fa001f0722cdc66f9', 
'uuidffb29730a2fb4f5f8c18f0c1bc254402', 
'uuida9f7a62b76fa4560834cc5789d5abf3a:cc', 3
alter 'uuid4323f716aea24b5fa001f0722cdc66f9', {NAME => 
'uuida9f7a62b76fa4560834cc5789d5abf3a', METHOD => 'delete'}
update_config 'hregion1,16020' {code}
Then execute read command in hbase shell
{code:java}
list_quota_table_sizes

TABLE  SIZE
uuid4323f716aea24b5fa001f0722cdc66f9 5133
1 row(s)
Took 0.0278 seconds {code}
uuid4323f716aea24b5fa001f0722cdc66f9 is not set in the quota table but it still 
appears in in list_quota_table_sizes results also with a strange value: 5133. 
h1. Thoughts

The root cause might be related to how *quota* table reacts when a new table is 
created. I am still investigating the root cause (Injecting logs to 
MasterQuotaManager to understand why it also records this table).

Is this normal behavior for list_quota_table_sizes? If this is abnormal 
behavior, I can give it a try for fixing.

  was:
When using HBase cluster 2.4.17, I noticed that the list_quota_table_sizes 
sometimes could return a table whose quota is not set.
h1. Reproduce

{_}This bug cannot be reproduced deterministically{_}. The probability for its 
manifestation is 1.1% (Keep executing the commands, and it occurs two times out 
of 178 repeatedly executions).

Start up HBase 2.4.17 cluster (2.10.2 HDFS, 1 HMaster, 2 RS)

Executing the following commands, and then perform the read
{code:java}
create_namespace 'uuidef8e6005b9e74092927a4b335424f7c5'
create 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', {NAME => 
'uuid07f904a09baf414d903b2818d3091f28', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuida60aaa69834e4f0596b5b3b3c12b2cb8', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid085c0991aa1c4d4ea145767e7e7bf60c', VERSIONS => 4, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEM

[jira] [Created] (HBASE-28125) list_quota_table_sizes returns a table whose quota is not set

2023-10-01 Thread Ke Han (Jira)
Ke Han created HBASE-28125:
--

 Summary: list_quota_table_sizes returns a table whose quota is not 
set
 Key: HBASE-28125
 URL: https://issues.apache.org/jira/browse/HBASE-28125
 Project: HBase
  Issue Type: Bug
  Components: Quotas
Affects Versions: 2.4.17
Reporter: Ke Han


When using HBase cluster 2.4.17, I noticed that the list_quota_table_sizes 
sometimes could return a table whose quota is not set.
h1. Reproduce

{_}This bug cannot be reproduced deterministically{_}. The probability for its 
manifestation is 1.1% (Keep executing the commands, and it occurs two times out 
of 178 repeatedly executions).

Start up HBase 2.4.17 cluster (2.10.2 HDFS, 1 HMaster, 2 RS)

Executing the following commands, and then perform the read
{code:java}
create_namespace 'uuidef8e6005b9e74092927a4b335424f7c5'
create 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', {NAME => 
'uuid07f904a09baf414d903b2818d3091f28', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuida60aaa69834e4f0596b5b3b3c12b2cb8', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid085c0991aa1c4d4ea145767e7e7bf60c', VERSIONS => 4, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuid1d9a2bc405c64708b9e471ae14794741', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}
create 'uuid5cafa12ce5034015bb597428b294a40d', {NAME => 
'uuid7d9efb39ac94472b90dc60ed3723cdf9', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}, {NAME => 
'uuidde273134b6434fc584990554cfa64b10', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}
clone_table_schema 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', 
'uuidb8e2393af3314726890b70ef5871a9d0'
drop 'uuid5cafa12ce5034015bb597428b294a40d'
truncate 'uuidb8e2393af3314726890b70ef5871a9d0'
compaction_state 'uuid80d6aa3495094cd7b4018d3ab3fe9db8'
truncate_preserve 'uuidb8e2393af3314726890b70ef5871a9d0'
drop 'uuidb8e2393af3314726890b70ef5871a9d0'
truncate 'uuid80d6aa3495094cd7b4018d3ab3fe9db8'
alter 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', {NAME => 
'uuida60aaa69834e4f0596b5b3b3c12b2cb8', METHOD => 'delete'}
disable 'uuid80d6aa3495094cd7b4018d3ab3fe9db8'
incr 'uuid80d6aa3495094cd7b4018d3ab3fe9db8', 
'uuid863efa8e4f1f44888af0ed139effba33', 
'uuid085c0991aa1c4d4ea145767e7e7bf60c:NONE', 3
drop 'uuid80d6aa3495094cd7b4018d3ab3fe9db8'
wal_roll 'hregion2,16020'
wal_roll 'hregion1,16020'
create 'uuid4323f716aea24b5fa001f0722cdc66f9', {NAME => 
'uuidd64032ff2e7340fb8832a16430fc14c1', VERSIONS => 3, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
'uuidc4be1958501543ac86661793a4c144cb', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}, {NAME => 
'uuid0c7961e7f67a464387f9f5ce428f08d1', VERSIONS => 4, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuida9f7a62b76fa4560834cc5789d5abf3a', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}
major_compact 'uuid4323f716aea24b5fa001f0722cdc66f9', 
'uuid0c7961e7f67a464387f9f5ce428f08d1'
incr 'uuid4323f716aea24b5fa001f0722cdc66f9', 
'uuidffb29730a2fb4f5f8c18f0c1bc254402', 
'uuida9f7a62b76fa4560834cc5789d5abf3a:cc', 3
alter 'uuid4323f716aea24b5fa001f0722cdc66f9', {NAME => 
'uuida9f7a62b76fa4560834cc5789d5abf3a', METHOD => 'delete'}
update_config 'hregion1,16020' {code}
 

Then execute read command in hbase shell
{code:java}
list_quota_table_sizes

TABLE  SIZE
uuid4323f716aea24b5fa001f0722cdc66f9 5133
1 row(s)
Took 0.0278 seconds {code}
uuid4323f716aea24b5fa001f0722cdc66f9 is not set in the quota table but it still 
appears in in list_quota_table_sizes results also with a strange value: 5133. 
h1. Thoughts

The root cause might be related to how *quota* table reacts when a new table is 
created. I am still investigating the root cause (Injecting logs to 
MasterQuotaManager to understand why it also records this table).

Is this normal behavior for list_quota_table_sizes? If this is abnormal 
behavior, I can give it a try for fixing.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HBASE-28105) NPE in QuotaCache if Table is dropped from cluster

2023-09-25 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han resolved HBASE-28105.

Resolution: Fixed

PR merged

> NPE in QuotaCache if Table is dropped from cluster
> --
>
> Key: HBASE-28105
> URL: https://issues.apache.org/jira/browse/HBASE-28105
> Project: HBase
>  Issue Type: Bug
>  Components: Quotas
>Affects Versions: 2.4.17, 2.5.5
>Reporter: Ke Han
>Priority: Major
> Attachments: 0001-avoid-NPE.patch, 
> hbase--regionserver-a0320910ca45.log
>
>
> When running HBase-2.4.17, I met a NPE in regionserver log.
> h1. Reproduce
> Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.
> Execute the following commands in the HMaster node using hbase shell, 
> {code:java}
> create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
> 'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
> 'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
> 'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
> create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
> create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
> 'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
> incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuid46ddc3d3557e413e915e2393ae72c082', 
> 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
> flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuid449de028da6b4d35be0f187ebec6c3be'
> drop 'uuiddeb610fded9744889840ecd03dd18739'
> put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
> 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
> 'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
> disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
> drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
> create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
> 'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
> 'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
> put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
> 'uuid552e42ade4c14099a1d8643bea1616d4', 
> 'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
> drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
> The exception will be thrown in either RS1 or RS2
> {code:java}
> 2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
> handler.AssignRegionHandler: Opened 
> uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
> 2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
> hbase.ScheduledChore: Caught error
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
>         at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
>         at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750){code}
> h1. Root Cause
> The NPE is thrown at function: updateQuotaFactors()
> {code:java}
> private void updateQuotaFactors() {
>   // Update machine quota factor
>   ClusterMetrics clusterMetrics;
>   try {
>     clusterMetrics = rsServices.getConnection().getAdmin()
>       .getClust

[jira] [Updated] (HBASE-28109) NPE for the region state: Failed to become active master (HMaster)

2023-09-23 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28109:
---
Description: 
When starting up HBase cluster (2.4.17), I met NPE and it prevents HMaster from 
starting up. I have to restart the HMaster.

My cluster contains 1 HMaster, 2 RS (HBase-2.4.17) and 1 Hadoop node (2.10.2).
{code:java}
2023-09-18 14:17:35,931 INFO  [PEWorker-1] procedure2.ProcedureExecutor: Rolled 
back pid=1, state=ROLLEDBACK, 
exception=org.apache.hadoop.hbase.exceptions.TimeoutIOException via 
ProcedureExecutor:org.apache.hadoop.hbase.exceptions.TimeoutIOException: 
Operation timed out after 1.0010 sec; InitMetaProcedure table=hbase:meta 
exec-time=1.4660 sec
2023-09-18 14:17:35,931 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Wait for region servers to report in: status=null, 
state=RUNNING, startTime=1695046655931, completionTime=-1
2023-09-18 14:17:35,932 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.ServerManager: Waiting on regionserver count=2; waited=0ms, expecting 
min=1 server(s), max=NO_LIMIT server(s), timeout=4500ms, lastChange=0ms
2023-09-18 14:17:37,438 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.ServerManager: Waiting on regionserver count=2; waited=1505ms, expecting 
min=1 server(s), max=NO_LIMIT server(s), timeout=4500ms, lastChange=1505ms
2023-09-18 14:17:38,941 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.ServerManager: Waiting on regionserver count=2; waited=3009ms, expecting 
min=1 server(s), max=NO_LIMIT server(s), timeout=4500ms, lastChange=3009ms
2023-09-18 14:17:40,445 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.ServerManager: Finished waiting on RegionServer count=2; waited=4513ms, 
expected min=1 server(s), max=NO_LIMIT server(s), master is running
2023-09-18 14:17:40,452 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.master.HMaster.isRegionOnline(HMaster.java:1229)
        at 
org.apache.hadoop.hbase.master.HMaster.waitForMetaOnline(HMaster.java:1218)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:968)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2193)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
        at java.lang.Thread.run(Thread.java:750)
2023-09-18 14:17:40,453 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Master server abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.quotas.MasterQuotasObserver] {code}
h1. Root Cause

>From the stack trace, the rs variable is NULL and it's directly used without 
>checking.
{code:java}
// hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java

  /**
   * @return True if region is online and scannable else false if an error or 
shutdown (Otherwise we
   *         just block in here holding up all forward-progess).
   */
  private boolean isRegionOnline(RegionInfo ri) {
    RetryCounter rc = null;
    while (!isStopped()) {
  // NPE line
      RegionState rs = 
this.assignmentManager.getRegionStates().getRegionState(ri);
      if (rs.isOpened()) {
        if (this.getServerManager().isServerOnline(rs.getServerName())) {
          return true;
        }
      }
      // Region{code}
I am not sure what causes the *rs* to be null but maybe we can add a check to 
make sure this NPE is captured and properly handled.

Restart the HMaster and this exception will disappear. I have attached the full 
log from HMaster for this case. I run into this exception when using HBase 
2.4.17 but I think it might also happen in the latest branch since the code of 
isRegionOnline is the same.
h1. Fix

This bug happens rarely. I think we can add a simple check to know whether rs 
is null and then decide whether to keep waiting or directly shutdown the 
HMaster.

I assume that if HMaster wait for more time, it will get correct responses from 
regionservers.

I have a simple PR to fix it.

https://github.com/apache/hbase/pull/5432

  was:
When starting up HBase cluster (2.4.17), I met NPE and it prevents HMaster from 
starting up. I have to restart the HMaster.

My cluster contains 1 HMaster, 2 RS (HBase-2.4.17) and 1 Hadoop node (2.10.2).
{code:java}
2023-09-18 14:17:35,931 INFO  [PEWorker-1] procedure2.ProcedureExecutor: Rolled 
back pid=1, state=ROLLEDBACK, 
exception=org.apache.hadoop.hbase.exceptions.TimeoutIOException via 
ProcedureExecutor:org.apache.hadoop.hbase.exceptions.TimeoutIOException: 
Operation timed out after 1.0010 sec; InitMetaProcedure table=hbase:meta 
exec-time=1.4660 sec
2023-09-18 14:17:35,931 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Wait for region servers to report in: status=null, 
state=RUNNING, startTime=1695046655931, completionTime=-1
2023-09-18 14:17:35,9

[jira] [Updated] (HBASE-28109) NPE for the region state: Failed to become active master (HMaster)

2023-09-23 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28109:
---
Description: 
When starting up HBase cluster (2.4.17), I met NPE and it prevents HMaster from 
starting up. I have to restart the HMaster.

My cluster contains 1 HMaster, 2 RS (HBase-2.4.17) and 1 Hadoop node (2.10.2).
{code:java}
2023-09-18 14:17:35,931 INFO  [PEWorker-1] procedure2.ProcedureExecutor: Rolled 
back pid=1, state=ROLLEDBACK, 
exception=org.apache.hadoop.hbase.exceptions.TimeoutIOException via 
ProcedureExecutor:org.apache.hadoop.hbase.exceptions.TimeoutIOException: 
Operation timed out after 1.0010 sec; InitMetaProcedure table=hbase:meta 
exec-time=1.4660 sec
2023-09-18 14:17:35,931 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Wait for region servers to report in: status=null, 
state=RUNNING, startTime=1695046655931, completionTime=-1
2023-09-18 14:17:35,932 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.ServerManager: Waiting on regionserver count=2; waited=0ms, expecting 
min=1 server(s), max=NO_LIMIT server(s), timeout=4500ms, lastChange=0ms
2023-09-18 14:17:37,438 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.ServerManager: Waiting on regionserver count=2; waited=1505ms, expecting 
min=1 server(s), max=NO_LIMIT server(s), timeout=4500ms, lastChange=1505ms
2023-09-18 14:17:38,941 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.ServerManager: Waiting on regionserver count=2; waited=3009ms, expecting 
min=1 server(s), max=NO_LIMIT server(s), timeout=4500ms, lastChange=3009ms
2023-09-18 14:17:40,445 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.ServerManager: Finished waiting on RegionServer count=2; waited=4513ms, 
expected min=1 server(s), max=NO_LIMIT server(s), master is running
2023-09-18 14:17:40,452 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.master.HMaster.isRegionOnline(HMaster.java:1229)
        at 
org.apache.hadoop.hbase.master.HMaster.waitForMetaOnline(HMaster.java:1218)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:968)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2193)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
        at java.lang.Thread.run(Thread.java:750)
2023-09-18 14:17:40,453 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Master server abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.quotas.MasterQuotasObserver] {code}
h1. Root Cause

>From the stack trace, the rs variable is NULL and it's directly used without 
>checking.
{code:java}
// hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java

  /**
   * @return True if region is online and scannable else false if an error or 
shutdown (Otherwise we
   *         just block in here holding up all forward-progess).
   */
  private boolean isRegionOnline(RegionInfo ri) {
    RetryCounter rc = null;
    while (!isStopped()) {
  // NPE line
      RegionState rs = 
this.assignmentManager.getRegionStates().getRegionState(ri);
      if (rs.isOpened()) {
        if (this.getServerManager().isServerOnline(rs.getServerName())) {
          return true;
        }
      }
      // Region{code}
I am not sure what causes the *rs* to be null but maybe we can add a check to 
make sure this NPE is captured and properly handled.

Restart the HMaster and this exception will disappear. I have attached the full 
log from HMaster for this case. I run into this exception when using HBase 
2.4.17 but I think it might also happen in the latest branch since the code of 
isRegionOnline is the same.
h1. Fix

This bug happens rarely. I think we can add a simple check to know whether rs 
is null and then decide whether to keep waiting or directly shutdown the 
HMaster.

I assume that if HMaster wait for more time, it will get correct responses from 
regionservers.

I have a simple PR to fix it.

  was:
When starting up HBase cluster (2.4.17), I met NPE and it prevents HMaster from 
starting up. I have to restart the HMaster.

My cluster contains 1 HMaster, 2 RS (HBase-2.4.17) and 1 Hadoop node (2.10.2).

 
{code:java}
2023-09-18 14:17:35,931 INFO  [PEWorker-1] procedure2.ProcedureExecutor: Rolled 
back pid=1, state=ROLLEDBACK, 
exception=org.apache.hadoop.hbase.exceptions.TimeoutIOException via 
ProcedureExecutor:org.apache.hadoop.hbase.exceptions.TimeoutIOException: 
Operation timed out after 1.0010 sec; InitMetaProcedure table=hbase:meta 
exec-time=1.4660 sec
2023-09-18 14:17:35,931 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Wait for region servers to report in: status=null, 
state=RUNNING, startTime=1695046655931, completionTime=-1
2023-09-18 14:17:35,932 INFO  [master/hmaster:16000:becomeAct

[jira] [Created] (HBASE-28109) NPE for the region state: Failed to become active master (HMaster)

2023-09-23 Thread Ke Han (Jira)
Ke Han created HBASE-28109:
--

 Summary: NPE for the region state: Failed to become active master 
(HMaster)
 Key: HBASE-28109
 URL: https://issues.apache.org/jira/browse/HBASE-28109
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.4.17
Reporter: Ke Han
 Attachments: hbase--master-ee4a85363fe2.log

When starting up HBase cluster (2.4.17), I met NPE and it prevents HMaster from 
starting up. I have to restart the HMaster.

My cluster contains 1 HMaster, 2 RS (HBase-2.4.17) and 1 Hadoop node (2.10.2).

 
{code:java}
2023-09-18 14:17:35,931 INFO  [PEWorker-1] procedure2.ProcedureExecutor: Rolled 
back pid=1, state=ROLLEDBACK, 
exception=org.apache.hadoop.hbase.exceptions.TimeoutIOException via 
ProcedureExecutor:org.apache.hadoop.hbase.exceptions.TimeoutIOException: 
Operation timed out after 1.0010 sec; InitMetaProcedure table=hbase:meta 
exec-time=1.4660 sec
2023-09-18 14:17:35,931 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Wait for region servers to report in: status=null, 
state=RUNNING, startTime=1695046655931, completionTime=-1
2023-09-18 14:17:35,932 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.ServerManager: Waiting on regionserver count=2; waited=0ms, expecting 
min=1 server(s), max=NO_LIMIT server(s), timeout=4500ms, lastChange=0ms
2023-09-18 14:17:37,438 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.ServerManager: Waiting on regionserver count=2; waited=1505ms, expecting 
min=1 server(s), max=NO_LIMIT server(s), timeout=4500ms, lastChange=1505ms
2023-09-18 14:17:38,941 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.ServerManager: Waiting on regionserver count=2; waited=3009ms, expecting 
min=1 server(s), max=NO_LIMIT server(s), timeout=4500ms, lastChange=3009ms
2023-09-18 14:17:40,445 INFO  [master/hmaster:16000:becomeActiveMaster] 
master.ServerManager: Finished waiting on RegionServer count=2; waited=4513ms, 
expected min=1 server(s), max=NO_LIMIT server(s), master is running
2023-09-18 14:17:40,452 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Failed to become active master
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.master.HMaster.isRegionOnline(HMaster.java:1229)
        at 
org.apache.hadoop.hbase.master.HMaster.waitForMetaOnline(HMaster.java:1218)
        at 
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:968)
        at 
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2193)
        at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:528)
        at java.lang.Thread.run(Thread.java:750)
2023-09-18 14:17:40,453 ERROR [master/hmaster:16000:becomeActiveMaster] 
master.HMaster: Master server abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.quotas.MasterQuotasObserver] {code}
 
h1. Root Cause

>From the stack trace, the rs variable is NULL and it's directly used without 
>checking.

 
{code:java}
// hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java

  /**
   * @return True if region is online and scannable else false if an error or 
shutdown (Otherwise we
   *         just block in here holding up all forward-progess).
   */
  private boolean isRegionOnline(RegionInfo ri) {
    RetryCounter rc = null;
    while (!isStopped()) {
  // NPE line
      RegionState rs = 
this.assignmentManager.getRegionStates().getRegionState(ri);
      if (rs.isOpened()) {
        if (this.getServerManager().isServerOnline(rs.getServerName())) {
          return true;
        }
      }
      // Region{code}
 

I am not sure what causes the rs to be null but maybe we can add a check to 
make sure this NPE is captured and properly handled.

Restart the HMaster and this exception will disappear. I have attached the full 
log from HMaster for this case. I run into this exception when using HBase 
2.4.17 but I think it might also happen in the latest branch since the code of 
isRegionOnline is the same.
h1. Fix

This bug happens rarely. I think we can add a simple check to know whether rs 
is null and then decide whether to keep waiting or directly shutdown the 
HMaster.

I assume that if HMaster wait for more time, it will get correct responses from 
regionservers.

I have a simple PR to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28105) NPE in QuotaCache if Table is dropped from cluster

2023-09-21 Thread Ke Han (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767613#comment-17767613
 ] 

Ke Han commented on HBASE-28105:


[~bbeaudreault] Thanks for the reply! I have submitted the PR: 
[https://github.com/apache/hbase/pull/5426/files.|https://github.com/apache/hbase/pull/5426/files]

> NPE in QuotaCache if Table is dropped from cluster
> --
>
> Key: HBASE-28105
> URL: https://issues.apache.org/jira/browse/HBASE-28105
> Project: HBase
>  Issue Type: Bug
>  Components: Quotas
>Affects Versions: 2.4.17, 2.5.5
>Reporter: Ke Han
>Priority: Major
> Attachments: 0001-avoid-NPE.patch, 
> hbase--regionserver-a0320910ca45.log
>
>
> When running HBase-2.4.17, I met a NPE in regionserver log.
> h1. Reproduce
> Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.
> Execute the following commands in the HMaster node using hbase shell, 
> {code:java}
> create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
> 'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
> 'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
> 'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
> create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
> create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
> 'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
> incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuid46ddc3d3557e413e915e2393ae72c082', 
> 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
> flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuid449de028da6b4d35be0f187ebec6c3be'
> drop 'uuiddeb610fded9744889840ecd03dd18739'
> put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
> 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
> 'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
> disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
> drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
> create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
> 'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
> 'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
> put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
> 'uuid552e42ade4c14099a1d8643bea1616d4', 
> 'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
> drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
> The exception will be thrown in either RS1 or RS2
> {code:java}
> 2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
> handler.AssignRegionHandler: Opened 
> uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
> 2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
> hbase.ScheduledChore: Caught error
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
>         at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
>         at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750){code}
> h1. Root Cause
> The NPE is thrown at function: updateQuotaFactors()
> {code:java}
> private 

[jira] [Updated] (HBASE-28105) NPE is thrown in QuotaCache.java when running HBase-2.4.17

2023-09-20 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28105:
---
Description: 
When running HBase-2.4.17, I met a NPE in regionserver log.
h1. Reproduce

Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.

Execute the following commands in the HMaster node using hbase shell, 
{code:java}
create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid46ddc3d3557e413e915e2393ae72c082', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid449de028da6b4d35be0f187ebec6c3be'
drop 'uuiddeb610fded9744889840ecd03dd18739'
put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
'uuid552e42ade4c14099a1d8643bea1616d4', 
'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
The exception will be thrown in either RS1 or RS2
{code:java}
2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
handler.AssignRegionHandler: Opened 
uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
hbase.ScheduledChore: Caught error
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
        at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}
h1. Root Cause

The NPE is thrown at function: updateQuotaFactors()
{code:java}
private void updateQuotaFactors() {
  // Update machine quota factor
  ClusterMetrics clusterMetrics;
  try {
    clusterMetrics = rsServices.getConnection().getAdmin()
      .getClusterMetrics(EnumSet.of(Option.SERVERS_NAME, 
Option.TABLE_TO_REGIONS_COUNT));
  } catch (IOException e) {
    LOG.warn("Failed to get cluster metrics needed for updating quotas", e);
    return;
  }  int rsSize = clusterMetrics.getServersName().size();
  if (rsSize != 0) {
    // TODO if use rs group, the cluster limit should be shared by the rs group
    machineQuotaFactor = 1.0 / rsSize;
  }  Map tableRegionStatesCount =
    clusterMetrics.getTableRegionStatesCount();  // Update table machine quota 
factors
  for (TableName tableName : tableQuotaCache.keySet()) {
    double factor = 1;
    try {
  // BUGGY LINE
      long regionSize = tableRegi

[jira] [Updated] (HBASE-28105) NPE is thrown in QuotaCache.java when running HBase-2.4.17

2023-09-20 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28105:
---
Description: 
When running HBase-2.4.17, I met a NPE in regionserver log.
h1. Reproduce

Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.

Execute the following commands in the HMaster node using hbase shell, 
{code:java}
create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid46ddc3d3557e413e915e2393ae72c082', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid449de028da6b4d35be0f187ebec6c3be'
drop 'uuiddeb610fded9744889840ecd03dd18739'
put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
'uuid552e42ade4c14099a1d8643bea1616d4', 
'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
The exception will be thrown in either RS1 or RS2
{code:java}
2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
handler.AssignRegionHandler: Opened 
uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
hbase.ScheduledChore: Caught error
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
        at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}
h1. Root Cause

The NPE is thrown at function: updateQuotaFactors()
{code:java}
private void updateQuotaFactors() {
  // Update machine quota factor
  ClusterMetrics clusterMetrics;
  try {
    clusterMetrics = rsServices.getConnection().getAdmin()
      .getClusterMetrics(EnumSet.of(Option.SERVERS_NAME, 
Option.TABLE_TO_REGIONS_COUNT));
  } catch (IOException e) {
    LOG.warn("Failed to get cluster metrics needed for updating quotas", e);
    return;
  }  int rsSize = clusterMetrics.getServersName().size();
  if (rsSize != 0) {
    // TODO if use rs group, the cluster limit should be shared by the rs group
    machineQuotaFactor = 1.0 / rsSize;
  }  Map tableRegionStatesCount =
    clusterMetrics.getTableRegionStatesCount();  // Update table machine quota 
factors
  for (TableName tableName : tableQuotaCache.keySet()) {
    double factor = 1;
    try {
  // BUGGY LINE
      long regionSize = tableRegi

[jira] [Updated] (HBASE-28105) NPE is thrown in QuotaCache.java when running HBase-2.4.17

2023-09-20 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28105:
---
Description: 
When running HBase-2.4.17, I met a NPE in regionserver log.
h1. Reproduce

Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.

Execute the following commands in the HMaster node using hbase shell, 
{code:java}
create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid46ddc3d3557e413e915e2393ae72c082', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid449de028da6b4d35be0f187ebec6c3be'
drop 'uuiddeb610fded9744889840ecd03dd18739'
put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
'uuid552e42ade4c14099a1d8643bea1616d4', 
'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
The exception will be thrown in either RS1 or RS2
{code:java}
2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
handler.AssignRegionHandler: Opened 
uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
hbase.ScheduledChore: Caught error
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
        at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}
h1. Root Cause

The NPE is thrown at function: updateQuotaFactors()
{code:java}
private void updateQuotaFactors() {
  // Update machine quota factor
  ClusterMetrics clusterMetrics;
  try {
    clusterMetrics = rsServices.getConnection().getAdmin()
      .getClusterMetrics(EnumSet.of(Option.SERVERS_NAME, 
Option.TABLE_TO_REGIONS_COUNT));
  } catch (IOException e) {
    LOG.warn("Failed to get cluster metrics needed for updating quotas", e);
    return;
  }  int rsSize = clusterMetrics.getServersName().size();
  if (rsSize != 0) {
    // TODO if use rs group, the cluster limit should be shared by the rs group
    machineQuotaFactor = 1.0 / rsSize;
  }  Map tableRegionStatesCount =
    clusterMetrics.getTableRegionStatesCount();  // Update table machine quota 
factors
  for (TableName tableName : tableQuotaCache.keySet()) {
    double factor = 1;
    try {
      long regionSize = tableRegionStatesCount.get(ta

[jira] [Updated] (HBASE-28105) NPE is thrown in QuotaCache.java when running HBase-2.4.17

2023-09-20 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28105:
---
Flags:   (was: Patch)

> NPE is thrown in QuotaCache.java when running HBase-2.4.17
> --
>
> Key: HBASE-28105
> URL: https://issues.apache.org/jira/browse/HBASE-28105
> Project: HBase
>  Issue Type: Bug
>  Components: Quotas
>Affects Versions: 2.4.17, 2.5.5
>Reporter: Ke Han
>Priority: Major
> Attachments: 0001-avoid-NPE.patch, 
> hbase--regionserver-a0320910ca45.log
>
>
> When running HBase-2.4.17, I met a NPE in regionserver log.
> h1. Reproduce
> Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.
> Execute the following commands in the HMaster node using hbase shell, 
> {code:java}
> create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
> 'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
> 'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
> 'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
> create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
> create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
> 'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
> incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuid46ddc3d3557e413e915e2393ae72c082', 
> 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
> flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuid449de028da6b4d35be0f187ebec6c3be'
> drop 'uuiddeb610fded9744889840ecd03dd18739'
> put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
> 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
> 'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
> disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
> drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
> create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
> 'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
> 'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
> put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
> 'uuid552e42ade4c14099a1d8643bea1616d4', 
> 'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
> drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
> The exception will be thrown in either RS1 or RS2
> {code:java}
> 2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
> handler.AssignRegionHandler: Opened 
> uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
> 2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
> hbase.ScheduledChore: Caught error
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
>         at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
>         at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750){code}
> h1. Root Cause
> The NPE is thrown at function: updateQuotaFactors()
> {code:java}
> private void updateQuotaFactors() {
>   // Update machine quota factor
>   ClusterMetrics clusterMetrics;
>   try {
>     clusterMetrics = rsServices.getConnection().getAdmin()
>       .g

[jira] [Updated] (HBASE-28105) NPE is thrown in QuotaCache.java when running HBase-2.4.17

2023-09-20 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28105:
---
Flags: Patch

> NPE is thrown in QuotaCache.java when running HBase-2.4.17
> --
>
> Key: HBASE-28105
> URL: https://issues.apache.org/jira/browse/HBASE-28105
> Project: HBase
>  Issue Type: Bug
>  Components: Quotas
>Affects Versions: 2.4.17, 2.5.5
>Reporter: Ke Han
>Priority: Major
> Attachments: 0001-avoid-NPE.patch, 
> hbase--regionserver-a0320910ca45.log
>
>
> When running HBase-2.4.17, I met a NPE in regionserver log.
> h1. Reproduce
> Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.
> Execute the following commands in the HMaster node using hbase shell, 
> {code:java}
> create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
> 'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
> 'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
> 'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
> create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
> create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
> 'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
> incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuid46ddc3d3557e413e915e2393ae72c082', 
> 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
> flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuid449de028da6b4d35be0f187ebec6c3be'
> drop 'uuiddeb610fded9744889840ecd03dd18739'
> put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
> 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
> 'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
> disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
> drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
> create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
> 'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
> 'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
> put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
> 'uuid552e42ade4c14099a1d8643bea1616d4', 
> 'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
> drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
> The exception will be thrown in either RS1 or RS2
> {code:java}
> 2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
> handler.AssignRegionHandler: Opened 
> uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
> 2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
> hbase.ScheduledChore: Caught error
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
>         at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
>         at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750){code}
> h1. Root Cause
> The NPE is thrown at function: updateQuotaFactors()
> {code:java}
> private void updateQuotaFactors() {
>   // Update machine quota factor
>   ClusterMetrics clusterMetrics;
>   try {
>     clusterMetrics = rsServices.getConnection().getAdmin()
>       .getCluster

[jira] [Updated] (HBASE-28105) NPE is thrown in QuotaCache.java when running HBase-2.4.17

2023-09-20 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28105:
---
Attachment: hbase--regionserver-a0320910ca45.log

> NPE is thrown in QuotaCache.java when running HBase-2.4.17
> --
>
> Key: HBASE-28105
> URL: https://issues.apache.org/jira/browse/HBASE-28105
> Project: HBase
>  Issue Type: Bug
>  Components: Quotas
>Affects Versions: 2.4.17, 2.5.5
>Reporter: Ke Han
>Priority: Major
> Attachments: 0001-avoid-NPE.patch, 
> hbase--regionserver-a0320910ca45.log
>
>
> When running HBase-2.4.17, I met a NPE in regionserver log.
> h1. Reproduce
> Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.
> Execute the following commands in the HMaster node using hbase shell, 
> {code:java}
> create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
> 'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
> 'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
> 'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
> 'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
> create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
> create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
> 'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
> BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
> incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuid46ddc3d3557e413e915e2393ae72c082', 
> 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
> flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuid449de028da6b4d35be0f187ebec6c3be'
> drop 'uuiddeb610fded9744889840ecd03dd18739'
> put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
> 'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
> 'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
> 'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
> disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
> drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
> create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
> 'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
> 'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
> BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
> put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
> 'uuid552e42ade4c14099a1d8643bea1616d4', 
> 'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
> drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
> The exception will be thrown in either RS1 or RS2
> {code:java}
> 2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
> handler.AssignRegionHandler: Opened 
> uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
> 2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
> hbase.ScheduledChore: Caught error
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
>         at 
> org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
>         at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>         at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>         at 
> org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:750){code}
> h1. Root Cause
> The NPE is thrown at function: updateQuotaFactors()
> {code:java}
> private void updateQuotaFactors() {
>   // Update machine quota factor
>   ClusterMetrics clusterMetrics;
>   try {
>     clusterMetrics = rsServices.getConnect

[jira] [Updated] (HBASE-28105) NPE is thrown in QuotaCache.java when running HBase-2.4.17

2023-09-20 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28105:
---
Description: 
When running HBase-2.4.17, I met a NPE in regionserver log.
h1. Reproduce

Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.

Execute the following commands in the HMaster node using hbase shell, 
{code:java}
create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid46ddc3d3557e413e915e2393ae72c082', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid449de028da6b4d35be0f187ebec6c3be'
drop 'uuiddeb610fded9744889840ecd03dd18739'
put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
'uuid552e42ade4c14099a1d8643bea1616d4', 
'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
The exception will be thrown in either RS1 or RS2
{code:java}
2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
handler.AssignRegionHandler: Opened 
uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
hbase.ScheduledChore: Caught error
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
        at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}
h1. Root Cause

The NPE is thrown at function: updateQuotaFactors()
{code:java}
private void updateQuotaFactors() {
  // Update machine quota factor
  ClusterMetrics clusterMetrics;
  try {
    clusterMetrics = rsServices.getConnection().getAdmin()
      .getClusterMetrics(EnumSet.of(Option.SERVERS_NAME, 
Option.TABLE_TO_REGIONS_COUNT));
  } catch (IOException e) {
    LOG.warn("Failed to get cluster metrics needed for updating quotas", e);
    return;
  }  int rsSize = clusterMetrics.getServersName().size();
  if (rsSize != 0) {
    // TODO if use rs group, the cluster limit should be shared by the rs group
    machineQuotaFactor = 1.0 / rsSize;
  }  Map tableRegionStatesCount =
    clusterMetrics.getTableRegionStatesCount();  // Update table machine quota 
factors
  for (TableName tableName : tableQuotaCache.keySet()) {
    double factor = 1;
    try {
      long regionSize = tableRegionStatesCount.get(ta

[jira] [Updated] (HBASE-28105) NPE is thrown in QuotaCache.java when running HBase-2.4.17

2023-09-20 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28105:
---
Description: 
When running HBase-2.4.17, I met a NPE in regionserver log.
h1. Reproduce

Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.

Execute the following commands in the HMaster node using hbase shell, 
{code:java}
create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid46ddc3d3557e413e915e2393ae72c082', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid449de028da6b4d35be0f187ebec6c3be'
drop 'uuiddeb610fded9744889840ecd03dd18739'
put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
'uuid552e42ade4c14099a1d8643bea1616d4', 
'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
The exception will be thrown in either RS1 or RS2
{code:java}
2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
handler.AssignRegionHandler: Opened 
uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
hbase.ScheduledChore: Caught error
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
        at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}
h1. Root Cause

The NPE is thrown at function: updateQuotaFactors()
{code:java}
private void updateQuotaFactors() {
  // Update machine quota factor
  ClusterMetrics clusterMetrics;
  try {
    clusterMetrics = rsServices.getConnection().getAdmin()
      .getClusterMetrics(EnumSet.of(Option.SERVERS_NAME, 
Option.TABLE_TO_REGIONS_COUNT));
  } catch (IOException e) {
    LOG.warn("Failed to get cluster metrics needed for updating quotas", e);
    return;
  }  int rsSize = clusterMetrics.getServersName().size();
  if (rsSize != 0) {
    // TODO if use rs group, the cluster limit should be shared by the rs group
    machineQuotaFactor = 1.0 / rsSize;
  }  Map tableRegionStatesCount =
    clusterMetrics.getTableRegionStatesCount();  // Update table machine quota 
factors
  for (TableName tableName : tableQuotaCache.keySet()) {
    double factor = 1;
    try {
      long regionSize = tableRegionStatesCount.get(ta

[jira] [Created] (HBASE-28105) NPE is thrown in QuotaCache.java when running HBase-2.4.17

2023-09-20 Thread Ke Han (Jira)
Ke Han created HBASE-28105:
--

 Summary: NPE is thrown in QuotaCache.java when running HBase-2.4.17
 Key: HBASE-28105
 URL: https://issues.apache.org/jira/browse/HBASE-28105
 Project: HBase
  Issue Type: Bug
  Components: Quotas
Affects Versions: 2.5.5, 2.4.17
Reporter: Ke Han
 Attachments: 0001-avoid-NPE.patch

When running HBase-2.4.17, I met a NPE in regionserver log.
h1. Reproduce

Config HBase cluster: 1 HMaster, 2 RS, 2.10.2 Hadoop.

Execute the following commands in the HMaster node using hbase shell, 

 
{code:java}
create 'uuidd9efa97f93a442b686adae6d9f7bb2e9', {NAME => 
'uuid099cbece77834a83a52bb0611c3ea080', VERSIONS => 3, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}, {NAME => 
'uuidbc1bea73952749329d7f025aab382c4e', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidff292310d9dc450697af2bb25d9f3e98', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'false'}, {NAME => 
'uuid449de028da6b4d35be0f187ebec6c3be', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'false'}, {NAME => 
'uuidc0840c98f9d348a18f2d454c7a503b65', VERSIONS => 2, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'false'}
create_namespace 'uuidec797633f5dd4ab9b96276135aeda9e2'
create 'uuiddeb610fded9744889840ecd03dd18739', {NAME => 
'uuid30a0f625ad454605908b60c932957ff0', VERSIONS => 1, COMPRESSION => 'GZ', 
BLOOMFILTER => 'ROW', IN_MEMORY => 'true'}
incr 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid46ddc3d3557e413e915e2393ae72c082', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 1
flush 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuid449de028da6b4d35be0f187ebec6c3be'
drop 'uuiddeb610fded9744889840ecd03dd18739'
put 'uuidd9efa97f93a442b686adae6d9f7bb2e9', 
'uuidf4704cae4d1e4661bd7664d26eb6b31b', 
'uuidbc1bea73952749329d7f025aab382c4e:JZycbUSpbDQmwgXinp', 
'XlPpFGvSYfcEXWXgwARytlSeiaSuHJFqpirMmLduqGnpdXLlHJWBumraXiifQSvHqNHmTcyzLQIvuQrkujPghfdtRkhOkgKEJHsAuAiMMeWZjdTHNZqhkOdJBOzsRYUXKOCNKeSxEDWgnKgsFDHMtxdnKKudBuceOgYmCrdaPXMclKkZKCIEiFDcdoAEJGKXYVfOjb'
disable 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
drop 'uuidd9efa97f93a442b686adae6d9f7bb2e9'
create 'uuid9d05a5cb34e64910ac90675186e7d0d4', {NAME => 
'uuid1ce512a5997b4efea3bdead2e7f723c3', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'ROWCOL', IN_MEMORY => 'true'}, {NAME => 
'uuid0b1baaa4275e46b2a3a1d11d6540fc30', VERSIONS => 2, COMPRESSION => 'NONE', 
BLOOMFILTER => 'NONE', IN_MEMORY => 'true'}
put 'uuid9d05a5cb34e64910ac90675186e7d0d4', 
'uuid552e42ade4c14099a1d8643bea1616d4', 
'uuid1ce512a5997b4efea3bdead2e7f723c3:l', 1
drop 'uuid9d05a5cb34e64910ac90675186e7d0d4'{code}
Then the exception will be thrown in either RS1 or RS2

 

 
{code:java}
2023-09-19 20:29:28,268 INFO  [RS_OPEN_REGION-regionserver/hregion2:16020-2] 
handler.AssignRegionHandler: Opened 
uuid9d05a5cb34e64910ac90675186e7d0d4,,1695155367072.f59a0693a9469f9e1f131bf2aac1486d.
2023-09-19 20:29:29,205 ERROR [regionserver/hregion2:16020.Chore.1] 
hbase.ScheduledChore: Caught error
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.updateQuotaFactors(QuotaCache.java:378)
        at 
org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:224)
        at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:158)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at 
org.apache.hadoop.hbase.JitterScheduledThreadPoolExecutorImpl$JitteredRunnableScheduledFuture.run(JitterScheduledThreadPoolExecutorImpl.java:107)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750){code}
 
h1. Root Cause

The NPE is thrown at 

 
{code:java}
private void updateQuotaFactors() {
  // Update machine quota factor
  ClusterMetrics clusterMetrics;
  try {
    clusterMetrics = rsServices.getConnection().getAdmin()
      .getClusterMetrics(EnumSet.of(Option.SERVERS_NAME, 
Option.TABLE_TO_REGIONS_COUNT));
  } catch (IOException e) {
    LOG.warn("Failed to get cluster metrics needed for updating quotas", e);
    return;
  }  int rsSize = clusterMetrics.getServersName().size();
  if (rsSize != 0) {
    // TODO if use rs group, the cluster limit should be shared by the rs group
    machineQuotaFactor = 1.0 / rsSize;
  }  Map tableRegionStatesCount =
    clu