[jira] [Created] (HBASE-28590) NPE after upgrade from 2.5.8 to 3.0.0
Ke Han created HBASE-28590: -- Summary: NPE after upgrade from 2.5.8 to 3.0.0 Key: HBASE-28590 URL: https://issues.apache.org/jira/browse/HBASE-28590 Project: HBase Issue Type: Bug Components: master Affects Versions: 3.0.0 Reporter: Ke Han Attachments: commands.txt, hbase--master-fc906f1808de.log, persistent.tar.gz When upgrade hbase cluster from 2.5.8 to 3.0.0 (commit: 516c89e8597fb6), I met the following NPE in master log. {code:java} 2024-05-11T02:17:47,293 ERROR [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: Unexpected throwable object java.lang.NullPointerException: null at org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463) ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] 2024-05-11T02:17:47,326 ERROR [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: Unexpected throwable object java.lang.NullPointerException: null at org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463) ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] 2024-05-11T02:17:47,337 ERROR [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: Unexpected throwable object java.lang.NullPointerException: null at org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463) ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]{code} h1. Reproduce This bug cannot be reproduced deterministically but it happens pretty frequently (10% to trigger with the following steps. 1. Start up 2.5.8 cluster with default configuration (1 HM, 2 RS, 1 HDFS) 2. Execute the commands in commands.txt 3. Stop the 2.5.8 cluster and upgrade to 3.0.0 cluster with default configuration (commit: 516c89e8597fb6, 1 HM, 2 RS, 1 HDFS) The error message will occur in master log. I attached (1) commands to reproduce it (2) master log and (3) full error logs of all nodes. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28590) NPE after upgrade from 2.5.8 to 3.0.0
[ https://issues.apache.org/jira/browse/HBASE-28590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Han updated HBASE-28590: --- Description: When upgrade hbase cluster from 2.5.8 to 3.0.0 (commit: 516c89e8597fb6), I met the following NPE in master log. {code:java} 2024-05-11T02:17:47,293 ERROR [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: Unexpected throwable object java.lang.NullPointerException: null at org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463) ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] 2024-05-11T02:17:47,326 ERROR [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: Unexpected throwable object java.lang.NullPointerException: null at org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463) ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] 2024-05-11T02:17:47,337 ERROR [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: Unexpected throwable object java.lang.NullPointerException: null at org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463) ~[hbase-protocol-shaded-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:443) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT]{code} h1. Reproduce This bug cannot be reproduced deterministically but it happens pretty frequently (10% to trigger with the following steps. 1. Start up 2.5.8 cluster with default configuration (1 HM, 2 RS, 1 HDFS) 2. Execute the commands in commands.txt 3. Stop the 2.5.8 cluster and upgrade to 3.0.0 cluster with default configuration (commit: 516c89e8597fb6, 1 HM, 2 RS, 1 HDFS) The error message will occur in master log. I attached (1) commands to reproduce it (2) master log and (3) full error logs of all nodes. was: When upgrade hbase cluster from 2.5.8 to 3.0.0 (commit: 516c89e8597fb6), I met the following NPE in master log. {code:java} 2024-05-11T02:17:47,293 ERROR [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16000] ipc.RpcServer: Unexpected throwable object java.lang.NullPointerException: null at org.apache.hadoop.hbase.master.MasterRpcServices.reportFileArchival(MasterRpcServices.java:2578) ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] at org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:16463)
[jira] [Updated] (HBASE-28583) Upgrade from 2.5.8 to 3.0.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema
[ https://issues.apache.org/jira/browse/HBASE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Han updated HBASE-28583: --- Summary: Upgrade from 2.5.8 to 3.0.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema (was: Upgrade from 2.5.8 to 3.0 crash with InvalidProtocolBufferException: Message missing required fields: old_table_schema) > Upgrade from 2.5.8 to 3.0.0 crash with InvalidProtocolBufferException: > Message missing required fields: old_table_schema > > > Key: HBASE-28583 > URL: https://issues.apache.org/jira/browse/HBASE-28583 > Project: HBase > Issue Type: Bug > Components: master >Affects Versions: 3.0.0, 2.5.8 >Reporter: Ke Han >Priority: Major > Attachments: hbase--master-033a47be7d1d.log, persistent.tar.gz > > > When migrating data from 2.5.8 cluster (1HM, 2RS, 1 HDFS) to 3.0.0 (1 HM, 2 > RS, 2 HDFS), I met the following exception and the upgrade failed. > {code:java} > 2024-05-10T00:54:45,936 ERROR [master/hmaster:16000:becomeActiveMaster] > master.HMaster: Failed to become active master > org.apache.hbase.thirdparty.com.google.protobuf.InvalidProtocolBufferException: > Message missing required fields: old_table_schema > at > org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException.asInvalidProtocolBufferException(UninitializedMessageException.java:56) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.checkMessageInitialized(AbstractParser.java:45) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:97) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:102) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:25) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hbase.thirdparty.com.google.protobuf.Any.unpack(Any.java:118) > ~[hbase-shaded-protobuf-4.1.7.jar:4.1.7] > at > org.apache.hadoop.hbase.procedure2.ProcedureUtil$StateSerializer.deserialize(ProcedureUtil.java:125) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.procedure.RestoreSnapshotProcedure.deserializeStateData(RestoreSnapshotProcedure.java:303) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureUtil.convertToProcedure(ProcedureUtil.java:295) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.store.ProtoAndProcedure.getProcedure(ProtoAndProcedure.java:43) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.store.InMemoryProcedureIterator.next(InMemoryProcedureIterator.java:90) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.loadProcedures(ProcedureExecutor.java:517) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$200(ProcedureExecutor.java:80) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor$1.load(ProcedureExecutor.java:344) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.store.region.RegionProcedureStore.load(RegionProcedureStore.java:287) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.load(ProcedureExecutor.java:335) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.procedure2.ProcedureExecutor.init(ProcedureExecutor.java:666) > ~[hbase-procedure-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.createProcedureExecutor(HMaster.java:1860) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1019) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] > at > org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2524) > ~[hbase-server-3.0.0-beta-2-SNAPSHOT.jar:3.0.0-beta-2-SNAPSHOT] >
[jira] [Updated] (HBASE-28589) Client Does not Stop Retrying after DoNotRetryException
[ https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhenyuLi updated HBASE-28589: - Description: I recently discovered that the fix for HBase-14598 does not completely resolve the issue. Their fix addressed two aspects: first, when the Scan/Get RPC attempts to allocate a very large array that could potentially lead to an out-of-memory (OOM) error, it will check the size of the array before allocation and directly throw an exception to prevent the region server from crashing and avoid possible cascading failures. Second, the developer intends for the client to stop retrying after such a failure, as retrying will not resolve the issue. However, their fix involved throwing a DoNotRetryException. After ByteBufferOutputStream.write throws the DoNotRetryException, in the call stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the DoNotRetryException is ultimately caught in the CallRunner.run function, with only a log printed. Consequently, the DoNotRetryException is not sent back to the client side. Instead, the client receives a generic exception for the failed RPC request and continues retrying, which is not the desired behavior. In the code of CallRunner, it is obvious that the DoNotRetryException in call.setResponse will be swallowed in the error handler with just a LOG printed. was: I recently discovered that the fix for HBase-14598 does not completely resolve the issue. Their fix addressed two aspects: first, when the Scan/Get RPC attempts to allocate a very large array that could potentially lead to an out-of-memory (OOM) error, it will check the size of the array before allocation and directly throw an exception to prevent the region server from crashing and avoid possible cascading failures. Second, the developer intends for the client to stop retrying after such a failure, as retrying will not resolve the issue. However, their fix involved throwing a DoNotRetryException. After ByteBufferOutputStream.write throws the DoNotRetryException, in the call stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the DoNotRetryException is ultimately caught in the CallRunner.run function, with only a log printed. Consequently, the DoNotRetryException is not sent back to the client side. Instead, the client receives a generic exception for the failed RPC request and continues retrying, which is not the desired behavior. After looking into the code of CallRunner, it is obvious that the DoNotRetryException in call.setResponse will be swallowed in the error handler with just a LOG printed. > Client Does not Stop Retrying after DoNotRetryException > --- > > Key: HBASE-28589 > URL: https://issues.apache.org/jira/browse/HBASE-28589 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0 >Reporter: ZhenyuLi >Priority: Minor > > I recently discovered that the fix for HBase-14598 does not completely > resolve the issue. Their fix addressed two aspects: first, when the Scan/Get > RPC attempts to allocate a very large array that could potentially lead to an > out-of-memory (OOM) error, it will check the size of the array before > allocation and directly throw an exception to prevent the region server from > crashing and avoid possible cascading failures. Second, the developer intends > for the client to stop retrying after such a failure, as retrying will not > resolve the issue. > However, their fix involved throwing a DoNotRetryException. After > ByteBufferOutputStream.write throws the DoNotRetryException, in the call > stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> > his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the > DoNotRetryException is ultimately caught in the CallRunner.run function, with > only a log printed. Consequently, the DoNotRetryException is not sent back to > the client side. Instead, the client receives a generic exception for the > failed RPC request and continues retrying, which is not the desired behavior. > In the code of CallRunner, it is obvious that the DoNotRetryException in > call.setResponse will be swallowed in the error handler with just a LOG > printed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28589) Client Does not Stop Retrying after DoNotRetryException
[ https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhenyuLi updated HBASE-28589: - External issue URL: (was: https://issues.apache.org/jira/browse/HBASE-14598) > Client Does not Stop Retrying after DoNotRetryException > --- > > Key: HBASE-28589 > URL: https://issues.apache.org/jira/browse/HBASE-28589 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0 >Reporter: ZhenyuLi >Priority: Minor > > I recently discovered that the fix for HBase-14598 does not completely > resolve the issue. Their fix addressed two aspects: first, when the Scan/Get > RPC attempts to allocate a very large array that could potentially lead to an > out-of-memory (OOM) error, it will check the size of the array before > allocation and directly throw an exception to prevent the region server from > crashing and avoid possible cascading failures. Second, the developer intends > for the client to stop retrying after such a failure, as retrying will not > resolve the issue. > However, their fix involved throwing a DoNotRetryException. After > ByteBufferOutputStream.write throws the DoNotRetryException, in the call > stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> > his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the > DoNotRetryException is ultimately caught in the CallRunner.run function, with > only a log printed. Consequently, the DoNotRetryException is not sent back to > the client side. Instead, the client receives a generic exception for the > failed RPC request and continues retrying, which is not the desired behavior. > After looking into the code of CallRunner, it is obvious that the > DoNotRetryException in call.setResponse will be swallowed in the error > handler with just a LOG printed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28589) Client Does not Stop Retrying after DoNotRetryException
[ https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhenyuLi updated HBASE-28589: - External issue ID: (was: HBase-14598) > Client Does not Stop Retrying after DoNotRetryException > --- > > Key: HBASE-28589 > URL: https://issues.apache.org/jira/browse/HBASE-28589 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0 >Reporter: ZhenyuLi >Priority: Minor > > I recently discovered that the fix for HBase-14598 does not completely > resolve the issue. Their fix addressed two aspects: first, when the Scan/Get > RPC attempts to allocate a very large array that could potentially lead to an > out-of-memory (OOM) error, it will check the size of the array before > allocation and directly throw an exception to prevent the region server from > crashing and avoid possible cascading failures. Second, the developer intends > for the client to stop retrying after such a failure, as retrying will not > resolve the issue. > However, their fix involved throwing a DoNotRetryException. After > ByteBufferOutputStream.write throws the DoNotRetryException, in the call > stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> > his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the > DoNotRetryException is ultimately caught in the CallRunner.run function, with > only a log printed. Consequently, the DoNotRetryException is not sent back to > the client side. Instead, the client receives a generic exception for the > failed RPC request and continues retrying, which is not the desired behavior. > After looking into the code of CallRunner, it is obvious that the > DoNotRetryException in call.setResponse will be swallowed in the error > handler with just a LOG printed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HBASE-28589) Client Does not Stop Retrying after DoNotRetryException
[ https://issues.apache.org/jira/browse/HBASE-28589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhenyuLi updated HBASE-28589: - Description: I recently discovered that the fix for HBase-14598 does not completely resolve the issue. Their fix addressed two aspects: first, when the Scan/Get RPC attempts to allocate a very large array that could potentially lead to an out-of-memory (OOM) error, it will check the size of the array before allocation and directly throw an exception to prevent the region server from crashing and avoid possible cascading failures. Second, the developer intends for the client to stop retrying after such a failure, as retrying will not resolve the issue. However, their fix involved throwing a DoNotRetryException. After ByteBufferOutputStream.write throws the DoNotRetryException, in the call stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the DoNotRetryException is ultimately caught in the CallRunner.run function, with only a log printed. Consequently, the DoNotRetryException is not sent back to the client side. Instead, the client receives a generic exception for the failed RPC request and continues retrying, which is not the desired behavior. After looking into the code of CallRunner, it is obvious that the DoNotRetryException in call.setResponse will be swallowed in the error handler with just a LOG printed. was: I recently discovered that the fix for HBase-14598 does not completely resolve the issue. Their fix addressed two aspects: first, when the Scan/Get RPC attempts to allocate a very large array that could potentially lead to an out-of-memory (OOM) error, it will check the size of the array before allocation and directly throw an exception to prevent the region server from crashing and avoid possible cascading failures. Second, the developer intends for the client to stop retrying after such a failure, as retrying will not resolve the issue. However, their fix involved throwing a DoNotRetryException. After ByteBufferOutputStream.write throws the DoNotRetryException, in the call stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the DoNotRetryException is ultimately caught in the CallRunner.run function, with only a log printed. Consequently, the DoNotRetryException is not sent back to the client side. Instead, the client receives a generic exception for the failed RPC request and continues retrying, which is not the desired behavior. > Client Does not Stop Retrying after DoNotRetryException > --- > > Key: HBASE-28589 > URL: https://issues.apache.org/jira/browse/HBASE-28589 > Project: HBase > Issue Type: Bug > Components: IPC/RPC >Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0, 2.0.0 >Reporter: ZhenyuLi >Priority: Minor > > I recently discovered that the fix for HBase-14598 does not completely > resolve the issue. Their fix addressed two aspects: first, when the Scan/Get > RPC attempts to allocate a very large array that could potentially lead to an > out-of-memory (OOM) error, it will check the size of the array before > allocation and directly throw an exception to prevent the region server from > crashing and avoid possible cascading failures. Second, the developer intends > for the client to stop retrying after such a failure, as retrying will not > resolve the issue. > However, their fix involved throwing a DoNotRetryException. After > ByteBufferOutputStream.write throws the DoNotRetryException, in the call > stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> > his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the > DoNotRetryException is ultimately caught in the CallRunner.run function, with > only a log printed. Consequently, the DoNotRetryException is not sent back to > the client side. Instead, the client receives a generic exception for the > failed RPC request and continues retrying, which is not the desired behavior. > After looking into the code of CallRunner, it is obvious that the > DoNotRetryException in call.setResponse will be swallowed in the error > handler with just a LOG printed. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HBASE-28589) Client Does not Stop Retrying after DoNotRetryException
ZhenyuLi created HBASE-28589: Summary: Client Does not Stop Retrying after DoNotRetryException Key: HBASE-28589 URL: https://issues.apache.org/jira/browse/HBASE-28589 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 2.0.0, 1.5.0, 1.4.0, 1.3.0, 1.2.0 Reporter: ZhenyuLi I recently discovered that the fix for HBase-14598 does not completely resolve the issue. Their fix addressed two aspects: first, when the Scan/Get RPC attempts to allocate a very large array that could potentially lead to an out-of-memory (OOM) error, it will check the size of the array before allocation and directly throw an exception to prevent the region server from crashing and avoid possible cascading failures. Second, the developer intends for the client to stop retrying after such a failure, as retrying will not resolve the issue. However, their fix involved throwing a DoNotRetryException. After ByteBufferOutputStream.write throws the DoNotRetryException, in the call stack (ByteBufferOutputStream.write --> encoder.write --> encodeCellsTo --> his.cellBlockBuilder.buildCellBlockStream --> call.setResponse), the DoNotRetryException is ultimately caught in the CallRunner.run function, with only a log printed. Consequently, the DoNotRetryException is not sent back to the client side. Instead, the client receives a generic exception for the failed RPC request and continues retrying, which is not the desired behavior. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] HBASE-28576 Remove FirstKeyValueMatchingQualifiersFilter [hbase]
2005hithlj commented on PR #5891: URL: https://github.com/apache/hbase/pull/5891#issuecomment-2105649073 This time, it was a different UT TestNamespaceReplication that failed, and it is not related to the changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (HBASE-28448) CompressionTest hangs when run over a Ozone ofs path
[ https://issues.apache.org/jira/browse/HBASE-28448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845540#comment-17845540 ] Hudson commented on HBASE-28448: Results for branch branch-2 [build #1053 on builds.a.o|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1053/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1053/General_20Nightly_20Build_20Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1053/JDK8_20Nightly_20Build_20Report_20_28Hadoop2_29/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1053/JDK8_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 jdk11 hadoop3 checks{color} -- For more information [see jdk11 report|https://ci-hbase.apache.org/job/HBase%20Nightly/job/branch-2/1053/JDK11_20Nightly_20Build_20Report_20_28Hadoop3_29/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > CompressionTest hangs when run over a Ozone ofs path > > > Key: HBASE-28448 > URL: https://issues.apache.org/jira/browse/HBASE-28448 > Project: HBase > Issue Type: Bug >Reporter: Pratyush Bhatt >Assignee: Wei-Chiu Chuang >Priority: Major > Labels: ozone, pull-request-available > Fix For: 4.0.0-alpha-1, 2.7.0, 3.0.0-beta-2, 2.6.1 > > Attachments: hbase_ozone_compression.jstack > > > If we run the Compression test over HDFS path, it works fine: > {code:java} > hbase org.apache.hadoop.hbase.util.CompressionTest > hdfs://ns1/tmp/dir1/dir2/test_file.txt snappy > 24/03/20 06:08:43 WARN impl.MetricsConfig: Cannot locate configuration: tried > hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 24/03/20 06:08:43 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot > period at 10 second(s). > 24/03/20 06:08:43 INFO impl.MetricsSystemImpl: HBase metrics system started > 24/03/20 06:08:43 INFO metrics.MetricRegistries: Loaded MetricRegistries > class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl > 24/03/20 06:08:43 INFO compress.CodecPool: Got brand-new compressor [.snappy] > 24/03/20 06:08:43 INFO compress.CodecPool: Got brand-new compressor [.snappy] > 24/03/20 06:08:44 INFO compress.CodecPool: Got brand-new decompressor > [.snappy] > SUCCESS {code} > The command exits, but when the same is tried over a ofs path, the command > hangs. > {code:java} > hbase org.apache.hadoop.hbase.util.CompressionTest > ofs://ozone1710862004/test-222compression-vol/compression-buck2/test_file.txt > snappy > 24/03/20 06:05:19 INFO protocolPB.OmTransportFactory: Loading OM transport > implementation > org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransportFactory as specified > by configuration. > 24/03/20 06:05:20 INFO client.ClientTrustManager: Loading certificates for > client. > 24/03/20 06:05:20 WARN impl.MetricsConfig: Cannot locate configuration: tried > hadoop-metrics2-hbase.properties,hadoop-metrics2.properties > 24/03/20 06:05:20 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot > period at 10 second(s). > 24/03/20 06:05:20 INFO impl.MetricsSystemImpl: HBase metrics system started > 24/03/20 06:05:20 INFO metrics.MetricRegistries: Loaded MetricRegistries > class org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl > 24/03/20 06:05:20 INFO rpc.RpcClient: Creating Volume: > test-222compression-vol, with om as owner and space quota set to -1 bytes, > counts quota set to -1 > 24/03/20 06:05:20 INFO rpc.RpcClient: Creating Bucket: > test-222compression-vol/compression-buck2, with bucket layout > FILE_SYSTEM_OPTIMIZED, om as owner, Versioning false, Storage Type set to > DISK and Encryption set to false, Replication Type set to server-side default > replication type, Namespace Quota set to -1, Space Quota set to -1 > 24/03/20 06:05:21 INFO compress.CodecPool: Got brand-new compressor [.snappy] > 24/03/20 06:05:21 INFO compress.CodecPool: Got brand-new compressor [.snappy] > 24/03/20 06:05:21 WARN impl.MetricsSystemImpl: HBase metrics system already > initialized! > 24/03/20 06:05:21 INFO metrics.MetricRegistries: Loaded MetricRegistries > class org.apache.ratis.metrics.dropwizard3.Dm3MetricRegistriesImpl > 24/03/20 06:05:22 INFO compress.CodecPool: Got brand-new decompressor > [.snappy] > SUCCESS > . > . > .{code} > The command doesnt exit. > Attaching the jstack of the process below: > [^hbase_ozone_compression.jstack] >