[jira] [Created] (HBASE-28590) NPE after upgrade from 2.5.8 to 3.0.0

2024-05-11 Thread Ke Han (Jira)
Ke Han created HBASE-28590:
--

 Summary: NPE after upgrade from 2.5.8 to 3.0.0
 Key: HBASE-28590
 URL: https://issues.apache.org/jira/browse/HBASE-28590
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 3.0.0
Reporter: Ke Han
 Attachments: commands.txt, hbase--master-fc906f1808de.log, 
persistent.tar.gz

When upgrade hbase cluster from 2.5.8 to 3.0.0 (commit: 516c89e8597fb6), I met 
the following NPE in master log.
{code:java}
2024-05-11T02:17:47,293 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
2024-05-11T02:17:47,326 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
2024-05-11T02:17:47,337 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]{code}
h1. Reproduce

This bug cannot be reproduced deterministically but it happens pretty 
frequently (10% to trigger with the following steps.

1. Start up 2.5.8 cluster with default configuration (1 HM, 2 RS, 1 HDFS)

2. Execute the commands in commands.txt

3. Stop the 2.5.8 cluster and upgrade to 3.0.0 cluster with default 
configuration (commit: 516c89e8597fb6, 1 HM, 2 RS, 1 HDFS) 

The error message will occur in master log.

I attached (1) commands to reproduce it (2) master log and (3) full error logs 
of all nodes.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28590) NPE after upgrade from 2.5.8 to 3.0.0

2024-05-11 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28590:
---
Description: 
When upgrade hbase cluster from 2.5.8 to 3.0.0 (commit: 516c89e8597fb6), I met 
the following NPE in master log.
{code:java}
2024-05-11T02:17:47,293 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
2024-05-11T02:17:47,326 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
2024-05-11T02:17:47,337 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) 
~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]{code}
h1. Reproduce

This bug cannot be reproduced deterministically but it happens pretty 
frequently (10% to trigger with the following steps.

1. Start up 2.5.8 cluster with default configuration (1 HM, 2 RS, 1 HDFS)

2. Execute the commands in commands.txt

3. Stop the 2.5.8 cluster and upgrade to 3.0.0 cluster with default 
configuration (commit: 516c89e8597fb6, 1 HM, 2 RS, 1 HDFS) 

The error message will occur in master log.

I attached (1) commands to reproduce it (2) master log and (3) full error logs 
of all nodes.

  was:
When upgrade hbase cluster from 2.5.8 to 3.0.0 (commit: 516c89e8597fb6), I met 
the following NPE in master log.
{code:java}
2024-05-11T02:17:47,293 ERROR 
[RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: 
Unexpected throwable object 
java.lang.NullPointerException: null
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578)
 ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
 

[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema

2024-05-11 Thread Ke Han (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Han updated HBASE-28583:
---
Summary: Upgrade from 2.5.8 to 3.0.0 crash with 
InvalidProtocolBufferException: Message missing required fields: 
old_table_schema  (was: Upgrade from 2.5.8 to 3.0 crash with 
InvalidProtocolBufferException: Message missing required fields: 
old_table_schema)

> Upgrade from 2.5.8 to 3.0.0 crash with InvalidProtocolBufferException: 
> Message missing required fields: old_table_schema
> 
>
> Key: HBASE-28583
> URL: https://issues.apache.org/jira/browse/HBASE-28583
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 3.0.0, 2.5.8
>Reporter: Ke Han
>Priority: Major
> Attachments: hbase--master-033a47be7d1d.log, persistent.tar.gz
>
>
> When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 
> RS, 2 HDFS), I met the following exception and the upgrade failed.
> {code:java}
> 2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] 
> master.HMaster: Failed to become active master
> org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException:
>  Message missing required fields: old_table_schema
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25)
>  ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) 
> ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666)
>  ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>         at 
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524)
>  ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]
>    

[jira] [Updated] (HBASE-28589) Client Does not Stop Retrying after DoNotRetryException

2024-05-11 Thread ZhenyuLi (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhenyuLi updated HBASE-28589:
-
Description: 
I recently discovered that the fix for HBase-14598 does not completely resolve 
the issue. Their fix addressed two aspects: first, when the Scan/Get RPC 
attempts to allocate a very large array that could potentially lead to an 
out-of-memory (OOM) error, it will check the size of the array before 
allocation and directly throw an exception to prevent the region server from 
crashing and avoid possible cascading failures. Second, the developer intends 
for the client to stop retrying after such a failure, as retrying will not 
resolve the issue.

However, their fix involved throwing a DoNotRetryException. After 
ByteBufferOutputStream.write throws the DoNotRetryException, in the call stack 
(ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> 
his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the 
DoNotRetryException is ultimately caught in the CallRunner.run function, with 
only a log printed. Consequently, the DoNotRetryException is not sent back to 
the client side. Instead, the client receives a generic exception for the 
failed RPC request and continues retrying, which is not the desired behavior.

In the code of CallRunner, it is obvious that the DoNotRetryException in 
call.setResponse will be swallowed in the error handler with just a LOG printed.

  was:
I recently discovered that the fix for HBase-14598 does not completely resolve 
the issue. Their fix addressed two aspects: first, when the Scan/Get RPC 
attempts to allocate a very large array that could potentially lead to an 
out-of-memory (OOM) error, it will check the size of the array before 
allocation and directly throw an exception to prevent the region server from 
crashing and avoid possible cascading failures. Second, the developer intends 
for the client to stop retrying after such a failure, as retrying will not 
resolve the issue.

However, their fix involved throwing a DoNotRetryException. After 
ByteBufferOutputStream.write throws the DoNotRetryException, in the call stack 
(ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> 
his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the 
DoNotRetryException is ultimately caught in the CallRunner.run function, with 
only a log printed. Consequently, the DoNotRetryException is not sent back to 
the client side. Instead, the client receives a generic exception for the 
failed RPC request and continues retrying, which is not the desired behavior.

After looking into the code of CallRunner, it is obvious that the 
DoNotRetryException in call.setResponse will be swallowed in the error handler 
with just a LOG printed.


> Client Does not Stop Retrying after DoNotRetryException
> ---
>
> Key: HBASE-28589
> URL: https://issues.apache.org/jira/browse/HBASE-28589
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0
>Reporter: ZhenyuLi
>Priority: Minor
>
> I recently discovered that the fix for HBase-14598 does not completely 
> resolve the issue. Their fix addressed two aspects: first, when the Scan/Get 
> RPC attempts to allocate a very large array that could potentially lead to an 
> out-of-memory (OOM) error, it will check the size of the array before 
> allocation and directly throw an exception to prevent the region server from 
> crashing and avoid possible cascading failures. Second, the developer intends 
> for the client to stop retrying after such a failure, as retrying will not 
> resolve the issue.
> However, their fix involved throwing a DoNotRetryException. After 
> ByteBufferOutputStream.write throws the DoNotRetryException, in the call 
> stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> 
> his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the 
> DoNotRetryException is ultimately caught in the CallRunner.run function, with 
> only a log printed. Consequently, the DoNotRetryException is not sent back to 
> the client side. Instead, the client receives a generic exception for the 
> failed RPC request and continues retrying, which is not the desired behavior.
> In the code of CallRunner, it is obvious that the DoNotRetryException in 
> call.setResponse will be swallowed in the error handler with just a LOG 
> printed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28589) Client Does not Stop Retrying after DoNotRetryException

2024-05-11 Thread ZhenyuLi (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhenyuLi updated HBASE-28589:
-
External issue URL:   (was: 
https://issues.apache.org/jira/browse/HBASE-14598)

> Client Does not Stop Retrying after DoNotRetryException
> ---
>
> Key: HBASE-28589
> URL: https://issues.apache.org/jira/browse/HBASE-28589
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0
>Reporter: ZhenyuLi
>Priority: Minor
>
> I recently discovered that the fix for HBase-14598 does not completely 
> resolve the issue. Their fix addressed two aspects: first, when the Scan/Get 
> RPC attempts to allocate a very large array that could potentially lead to an 
> out-of-memory (OOM) error, it will check the size of the array before 
> allocation and directly throw an exception to prevent the region server from 
> crashing and avoid possible cascading failures. Second, the developer intends 
> for the client to stop retrying after such a failure, as retrying will not 
> resolve the issue.
> However, their fix involved throwing a DoNotRetryException. After 
> ByteBufferOutputStream.write throws the DoNotRetryException, in the call 
> stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> 
> his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the 
> DoNotRetryException is ultimately caught in the CallRunner.run function, with 
> only a log printed. Consequently, the DoNotRetryException is not sent back to 
> the client side. Instead, the client receives a generic exception for the 
> failed RPC request and continues retrying, which is not the desired behavior.
> After looking into the code of CallRunner, it is obvious that the 
> DoNotRetryException in call.setResponse will be swallowed in the error 
> handler with just a LOG printed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28589) Client Does not Stop Retrying after DoNotRetryException

2024-05-11 Thread ZhenyuLi (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhenyuLi updated HBASE-28589:
-
External issue ID:   (was: HBase-14598)

> Client Does not Stop Retrying after DoNotRetryException
> ---
>
> Key: HBASE-28589
> URL: https://issues.apache.org/jira/browse/HBASE-28589
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0
>Reporter: ZhenyuLi
>Priority: Minor
>
> I recently discovered that the fix for HBase-14598 does not completely 
> resolve the issue. Their fix addressed two aspects: first, when the Scan/Get 
> RPC attempts to allocate a very large array that could potentially lead to an 
> out-of-memory (OOM) error, it will check the size of the array before 
> allocation and directly throw an exception to prevent the region server from 
> crashing and avoid possible cascading failures. Second, the developer intends 
> for the client to stop retrying after such a failure, as retrying will not 
> resolve the issue.
> However, their fix involved throwing a DoNotRetryException. After 
> ByteBufferOutputStream.write throws the DoNotRetryException, in the call 
> stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> 
> his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the 
> DoNotRetryException is ultimately caught in the CallRunner.run function, with 
> only a log printed. Consequently, the DoNotRetryException is not sent back to 
> the client side. Instead, the client receives a generic exception for the 
> failed RPC request and continues retrying, which is not the desired behavior.
> After looking into the code of CallRunner, it is obvious that the 
> DoNotRetryException in call.setResponse will be swallowed in the error 
> handler with just a LOG printed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28589) Client Does not Stop Retrying after DoNotRetryException

2024-05-11 Thread ZhenyuLi (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhenyuLi updated HBASE-28589:
-
Description: 
I recently discovered that the fix for HBase-14598 does not completely resolve 
the issue. Their fix addressed two aspects: first, when the Scan/Get RPC 
attempts to allocate a very large array that could potentially lead to an 
out-of-memory (OOM) error, it will check the size of the array before 
allocation and directly throw an exception to prevent the region server from 
crashing and avoid possible cascading failures. Second, the developer intends 
for the client to stop retrying after such a failure, as retrying will not 
resolve the issue.

However, their fix involved throwing a DoNotRetryException. After 
ByteBufferOutputStream.write throws the DoNotRetryException, in the call stack 
(ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> 
his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the 
DoNotRetryException is ultimately caught in the CallRunner.run function, with 
only a log printed. Consequently, the DoNotRetryException is not sent back to 
the client side. Instead, the client receives a generic exception for the 
failed RPC request and continues retrying, which is not the desired behavior.

After looking into the code of CallRunner, it is obvious that the 
DoNotRetryException in call.setResponse will be swallowed in the error handler 
with just a LOG printed.

  was:
I recently discovered that the fix for HBase-14598 does not completely resolve 
the issue. Their fix addressed two aspects: first, when the Scan/Get RPC 
attempts to allocate a very large array that could potentially lead to an 
out-of-memory (OOM) error, it will check the size of the array before 
allocation and directly throw an exception to prevent the region server from 
crashing and avoid possible cascading failures. Second, the developer intends 
for the client to stop retrying after such a failure, as retrying will not 
resolve the issue.

However, their fix involved throwing a DoNotRetryException. After 
ByteBufferOutputStream.write throws the DoNotRetryException, in the call stack 
(ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> 
his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the 
DoNotRetryException is ultimately caught in the CallRunner.run function, with 
only a log printed. Consequently, the DoNotRetryException is not sent back to 
the client side. Instead, the client receives a generic exception for the 
failed RPC request and continues retrying, which is not the desired behavior.


> Client Does not Stop Retrying after DoNotRetryException
> ---
>
> Key: HBASE-28589
> URL: https://issues.apache.org/jira/browse/HBASE-28589
> Project: HBase
>  Issue Type: Bug
>  Components: IPC/RPC
>Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0
>Reporter: ZhenyuLi
>Priority: Minor
>
> I recently discovered that the fix for HBase-14598 does not completely 
> resolve the issue. Their fix addressed two aspects: first, when the Scan/Get 
> RPC attempts to allocate a very large array that could potentially lead to an 
> out-of-memory (OOM) error, it will check the size of the array before 
> allocation and directly throw an exception to prevent the region server from 
> crashing and avoid possible cascading failures. Second, the developer intends 
> for the client to stop retrying after such a failure, as retrying will not 
> resolve the issue.
> However, their fix involved throwing a DoNotRetryException. After 
> ByteBufferOutputStream.write throws the DoNotRetryException, in the call 
> stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> 
> his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the 
> DoNotRetryException is ultimately caught in the CallRunner.run function, with 
> only a log printed. Consequently, the DoNotRetryException is not sent back to 
> the client side. Instead, the client receives a generic exception for the 
> failed RPC request and continues retrying, which is not the desired behavior.
> After looking into the code of CallRunner, it is obvious that the 
> DoNotRetryException in call.setResponse will be swallowed in the error 
> handler with just a LOG printed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28589) Client Does not Stop Retrying after DoNotRetryException

2024-05-11 Thread ZhenyuLi (Jira)
ZhenyuLi created HBASE-28589:


 Summary: Client Does not Stop Retrying after DoNotRetryException
 Key: HBASE-28589
 URL: https://issues.apache.org/jira/browse/HBASE-28589
 Project: HBase
  Issue Type: Bug
  Components: IPC/RPC
Affects Versions: 2.0.0, 1.5.0, 1.4.0, 1.3.0, 1.2.0
Reporter: ZhenyuLi


I recently discovered that the fix for HBase-14598 does not completely resolve 
the issue. Their fix addressed two aspects: first, when the Scan/Get RPC 
attempts to allocate a very large array that could potentially lead to an 
out-of-memory (OOM) error, it will check the size of the array before 
allocation and directly throw an exception to prevent the region server from 
crashing and avoid possible cascading failures. Second, the developer intends 
for the client to stop retrying after such a failure, as retrying will not 
resolve the issue.

However, their fix involved throwing a DoNotRetryException. After 
ByteBufferOutputStream.write throws the DoNotRetryException, in the call stack 
(ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> 
his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the 
DoNotRetryException is ultimately caught in the CallRunner.run function, with 
only a log printed. Consequently, the DoNotRetryException is not sent back to 
the client side. Instead, the client receives a generic exception for the 
failed RPC request and continues retrying, which is not the desired behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] HBASE-28576 Remove FirstKeyValueMatchingQualifiersFilter [hbase]

2024-05-11 Thread via GitHub


2005hithlj commented on PR #5891:
URL: https://github.com/apache/hbase/pull/5891#issuecomment-2105649073

   This time, it was a different UT TestNamespaceReplication that failed, and 
it is not related to the changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (HBASE-28448) CompressionTest hangs when run over a Ozone ofs path

2024-05-11 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845540#comment-17845540
 ] 

Hudson commented on HBASE-28448:


Results for branch branch-2
[build #1053 on 
builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1053/]: 
(/) *{color:green}+1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1053/General_20Nightly_20Build_20Report/]


(/) {color:green}+1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1053/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1053/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 jdk11 hadoop3 checks{color}
-- For more information [see jdk11 
report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1053/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> CompressionTest hangs when run over a Ozone ofs path
> 
>
> Key: HBASE-28448
> URL: https://issues.apache.org/jira/browse/HBASE-28448
> Project: HBase
>  Issue Type: Bug
>Reporter: Pratyush Bhatt
>Assignee: Wei-Chiu Chuang
>Priority: Major
>  Labels: ozone, pull-request-available
> Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.6.1
>
> Attachments: hbase_ozone_compression.jstack
>
>
> If we run the Compression test over HDFS path, it works fine:
> {code:java}
> hbase org.apache.hadoop.hbase.util.CompressionTest 
> hdfs://ns1/tmp/dir1/dir2/test_file.txt snappy
> 24/03/20 06:08:43 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 24/03/20 06:08:43 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 24/03/20 06:08:43 INFO impl.MetricsSystemImpl: HBase metrics system started
> 24/03/20 06:08:43 INFO metrics.MetricRegistries: Loaded MetricRegistries 
> class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
> 24/03/20 06:08:43 INFO compress.CodecPool: Got brand-new compressor [.snappy]
> 24/03/20 06:08:43 INFO compress.CodecPool: Got brand-new compressor [.snappy]
> 24/03/20 06:08:44 INFO compress.CodecPool: Got brand-new decompressor 
> [.snappy]
> SUCCESS {code}
> The command exits, but when the same is tried over a ofs path, the command 
> hangs.
> {code:java}
> hbase org.apache.hadoop.hbase.util.CompressionTest 
> ofs://ozone1710862004/test-222compression-vol/compression-buck2/test_file.txt 
> snappy
> 24/03/20 06:05:19 INFO protocolPB.OmTransportFactory: Loading OM transport 
> implementation 
> org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransportFactory as specified 
> by configuration.
> 24/03/20 06:05:20 INFO client.ClientTrustManager: Loading certificates for 
> client.
> 24/03/20 06:05:20 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> 24/03/20 06:05:20 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 24/03/20 06:05:20 INFO impl.MetricsSystemImpl: HBase metrics system started
> 24/03/20 06:05:20 INFO metrics.MetricRegistries: Loaded MetricRegistries 
> class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
> 24/03/20 06:05:20 INFO rpc.RpcClient: Creating Volume: 
> test-222compression-vol, with om as owner and space quota set to -1 bytes, 
> counts quota set to -1
> 24/03/20 06:05:20 INFO rpc.RpcClient: Creating Bucket: 
> test-222compression-vol/compression-buck2, with bucket layout 
> FILE_SYSTEM_OPTIMIZED, om as owner, Versioning false, Storage Type set to 
> DISK and Encryption set to false, Replication Type set to server-side default 
> replication type, Namespace Quota set to -1, Space Quota set to -1
> 24/03/20 06:05:21 INFO compress.CodecPool: Got brand-new compressor [.snappy]
> 24/03/20 06:05:21 INFO compress.CodecPool: Got brand-new compressor [.snappy]
> 24/03/20 06:05:21 WARN impl.MetricsSystemImpl: HBase metrics system already 
> initialized!
> 24/03/20 06:05:21 INFO metrics.MetricRegistries: Loaded MetricRegistries 
> class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl
> 24/03/20 06:05:22 INFO compress.CodecPool: Got brand-new decompressor 
> [.snappy]
> SUCCESS 
> .
> .
> .{code}
> The command doesnt exit.
> Attaching the jstack of the process below:
> [^hbase_ozone_compression.jstack]
>