[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563499#comment-17563499 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 7/7/22 1:09 AM: - bq. In general, I prefer we just remove the SimpleRpcServer implementation and rewrite the decode and encode part with netty, to make the code more clear. SimpleRcpServer is currently used as a fallback by Cloudera customers (and I presume others) with 2.2 when the Netty implementation has issues. I would also want it as a fallback option for our production. Anyway this is the kind of major operational change which should have a deprecation before removal. I know it is not the default but still represents an important fallback option. We should be accommodating to users here. Deprecation can be done now, that seems ok. Removal can be done in 3.0. So it should be fixed first, removed later. Can land SimpleRpcServer specific things on HBASE-27097 after this issue is done. was (Author: apurtell): bq. In general, I prefer we just remove the SimpleRpcServer implementation and rewrite the decode and encode part with netty, to make the code more clear. SimpleRcpServer is currently used as a fallback by Cloudera customers (and I presume others) with 2.2 when the Netty implementation has issues. I would also want it as a fallback option for our production. Anyway this is the kind of major operational change which should have a deprecation before removal. I know it is not the default but still represents an important fallback option. We should be accommodating to users here. Deprecation can be done now, that seems ok. Removal can be done in 3.0. -So it should be fixed first, removed later. Can land SimpleRpcServer specific things on HBASE-27097 after this issue is done.- > Netty "leak detected" and OutOfDirectMemoryError due to direct memory > buffering with SASL implementation > > > Key: HBASE-26708 > URL: https://issues.apache.org/jira/browse/HBASE-26708 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 2.5.0, 2.4.6 >Reporter: Viraj Jasani >Assignee: Duo Zhang >Priority: Blocker > Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14 > > > Under constant data ingestion, using default Netty based RpcServer and > RpcClient implementation results in OutOfDirectMemoryError, supposedly caused > by leaks detected by Netty's LeakDetector. > {code:java} > 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - java:115) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] >
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563499#comment-17563499 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 7/7/22 1:08 AM: - bq. In general, I prefer we just remove the SimpleRpcServer implementation and rewrite the decode and encode part with netty, to make the code more clear. SimpleRcpServer is currently used as a fallback by Cloudera customers (and I presume others) with 2.2 when the Netty implementation has issues. I would also want it as a fallback option for our production. Anyway this is the kind of major operational change which should have a deprecation before removal. I know it is not the default but still represents an important fallback option. We should be accommodating to users here. Deprecation can be done now, that seems ok. Removal can be done in 3.0. -So it should be fixed first, removed later. Can land SimpleRpcServer specific things on HBASE-27097 after this issue is done.- was (Author: apurtell): bq. In general, I prefer we just remove the SimpleRpcServer implementation and rewrite the decode and encode part with netty, to make the code more clear. SimpleRcpServer is currently used as a fallback by Cloudera customers (and I presume others) with 2.2 when the Netty implementation has issues. I would also want it as a fallback option for our production. Anyway this is the kind of major operational change which should have a deprecation before removal. I know it is not the default but still represents an important fallback option. We should be accommodating to users here. Deprecation can be done now, that seems ok. Removal can be done in 3.0. So it should be fixed first, removed later. Can land SimpleRpcServer specific things on HBASE-27097 after this issue is done. > Netty "leak detected" and OutOfDirectMemoryError due to direct memory > buffering with SASL implementation > > > Key: HBASE-26708 > URL: https://issues.apache.org/jira/browse/HBASE-26708 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 2.5.0, 2.4.6 >Reporter: Viraj Jasani >Assignee: Duo Zhang >Priority: Blocker > Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14 > > > Under constant data ingestion, using default Netty based RpcServer and > RpcClient implementation results in OutOfDirectMemoryError, supposedly caused > by leaks detected by Netty's LeakDetector. > {code:java} > 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - java:115) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] >
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563499#comment-17563499 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 7/6/22 11:53 PM: -- bq. In general, I prefer we just remove the SimpleRpcServer implementation and rewrite the decode and encode part with netty, to make the code more clear. SimpleRcpServer is currently used as a fallback by Cloudera customers (and I presume others) with 2.2 when the Netty implementation has issues. I would also want it as a fallback option for our production. Anyway this is the kind of major operational change which should have a deprecation before removal. I know it is not the default but still represents an important fallback option. We should be accommodating to users here. Deprecation can be done now, that seems ok. Removal can be done in 3.0. So it should be fixed first, removed later. Can land SimpleRpcServer specific things on HBASE-27097 after this issue is done. was (Author: apurtell): bq. In general, I prefer we just remove the SimpleRpcServer implementation and rewrite the decode and encode part with netty, to make the code more clear. SimpleRcpServer is currently used as a fallback by Cloudera customers (and I presume others) with 2.2 when the Netty implementation has issues. I would also want it as a fallback option for our production. Anyway this is the kind of major operational change which should have a deprecation before removal. I know it is not the default but still represents an important fallback option. We should be accommodating to users here. Deprecation can be done now, that seems ok. Removal can be done in 3.0. > Netty "leak detected" and OutOfDirectMemoryError due to direct memory > buffering with SASL implementation > > > Key: HBASE-26708 > URL: https://issues.apache.org/jira/browse/HBASE-26708 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 2.5.0, 2.4.6 >Reporter: Viraj Jasani >Assignee: Duo Zhang >Priority: Blocker > Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14 > > > Under constant data ingestion, using default Netty based RpcServer and > RpcClient implementation results in OutOfDirectMemoryError, supposedly caused > by leaks detected by Netty's LeakDetector. > {code:java} > 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - java:115) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - >
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563499#comment-17563499 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 7/6/22 11:43 PM: -- bq. In general, I prefer we just remove the SimpleRpcServer implementation and rewrite the decode and encode part with netty, to make the code more clear. SimpleRcpServer is currently used as a fallback by Cloudera customers (and I presume others) with 2.2 when the Netty implementation has issues. I would also want it as a fallback option for our production. Anyway this is the kind of major operational change which should have a deprecation before removal. I know it is not the default but still represents an important fallback option. We should be accommodating to users here. Deprecation can be done now, that seems ok. Removal can be done in 3.0. was (Author: apurtell): bq. In general, I prefer we just remove the SimpleRpcServer implementation and rewrite the decode and encode part with netty, to make the code more clear. SimpleRcpServer is currently used as a fallback by Cloudera customers (and I presume others) with 2.2 when the Netty implementation has issues. I would also want it as a fallback option for our production. Anyway this is the kind of major operational change which should have a deprecation before removal. Deprecation can be done now, that seems ok. Removal can be done in 3.0. > Netty "leak detected" and OutOfDirectMemoryError due to direct memory > buffering with SASL implementation > > > Key: HBASE-26708 > URL: https://issues.apache.org/jira/browse/HBASE-26708 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 2.5.0, 2.4.6 >Reporter: Viraj Jasani >Assignee: Duo Zhang >Priority: Blocker > Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14 > > > Under constant data ingestion, using default Netty based RpcServer and > RpcClient implementation results in OutOfDirectMemoryError, supposedly caused > by leaks detected by Netty's LeakDetector. > {code:java} > 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - java:115) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - > apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) > >
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563499#comment-17563499 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 7/6/22 11:41 PM: -- bq. In general, I prefer we just remove the SimpleRpcServer implementation and rewrite the decode and encode part with netty, to make the code more clear. SimpleRcpServer is currently used as a fallback by Cloudera customers (and I presume others) with 2.2 when the Netty implementation has issues. I would also want it as a fallback option for our production. Anyway this is the kind of major operational change which should have a deprecation before removal. Deprecation can be done now, that seems ok. Removal can be done in 3.0. was (Author: apurtell): bq. In general, I prefer we just remove the SimpleRpcServer implementation and rewrite the decode and encode part with netty, to make the code more clear. This will not be possible in 2.x. We need a fix. SimpleRcpServer is currently used as a fallback by Cloudera customers (and I presume others) with 2.2 when the Netty implementation has issues. I would also want it as a fallback option for our production. Anyway this is the kind of major operational change which should have a deprecation before removal. Deprecation can be done now, that seems ok. Removal can be done in 3.0. > Netty "leak detected" and OutOfDirectMemoryError due to direct memory > buffering with SASL implementation > > > Key: HBASE-26708 > URL: https://issues.apache.org/jira/browse/HBASE-26708 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 2.5.0, 2.4.6 >Reporter: Viraj Jasani >Assignee: Duo Zhang >Priority: Blocker > Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14 > > > Under constant data ingestion, using default Netty based RpcServer and > RpcClient implementation results in OutOfDirectMemoryError, supposedly caused > by leaks detected by Netty's LeakDetector. > {code:java} > 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - java:115) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - > apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) > >
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563499#comment-17563499 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 7/6/22 11:41 PM: -- bq. In general, I prefer we just remove the SimpleRpcServer implementation and rewrite the decode and encode part with netty, to make the code more clear. This will not be possible in 2.x. We need a fix. SimpleRcpServer is currently used as a fallback by Cloudera customers (and I presume others) with 2.2 when the Netty implementation has issues. I would also want it as a fallback option for our production. Anyway this is the kind of major operational change which should have a deprecation before removal. Deprecation can be done now, that seems ok. Removal can be done in 3.0. was (Author: apurtell): bq. In general, I prefer we just remove the SimpleRpcServer implementation and rewrite the decode and encode part with netty, to make the code more clear. This will not be possible in 2.x. We need a fix. SimpleRcpServer is currently used as a fallback by Cloudera customers (and I presume others) with 2.2 when the Netty implementation has issues, and anyway this is the kind of major operational change which should have a deprecation before removal. Deprecation can be done now, that seems ok. Removal can be done in 3.0. > Netty "leak detected" and OutOfDirectMemoryError due to direct memory > buffering with SASL implementation > > > Key: HBASE-26708 > URL: https://issues.apache.org/jira/browse/HBASE-26708 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 2.5.0, 2.4.6 >Reporter: Viraj Jasani >Assignee: Duo Zhang >Priority: Blocker > Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14 > > > Under constant data ingestion, using default Netty based RpcServer and > RpcClient implementation results in OutOfDirectMemoryError, supposedly caused > by leaks detected by Netty's LeakDetector. > {code:java} > 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - java:115) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - > apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) > >
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:15 PM: -- On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, because SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", default 10), but now netty may be able to queue up a lot more, in comparison, because netty has been designed for concurrency. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_in_flight_max x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. It may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead of unbounded? maxPendingTasks probably should not be INT_MAX, but that may matter less. The goal would be to limit concurrency at the netty layer in such a way that: 1. Performance is still good 2. Under load, we don't balloon resource usage at the netty layer I could be looking at something that isn't the real issue but it is notable. was (Author: apurtell): On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, because SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", default 10), but now netty may be able to queue up a lot more, in comparison, because netty has been designed for concurrency. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. It may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default.
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:14 PM: -- On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, because SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", default 10), but now netty may be able to queue up a lot more, in comparison, because netty has been designed for concurrency. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. It may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead of unbounded? maxPendingTasks probably should not be INT_MAX, but that may matter less. The goal would be to limit concurrency at the netty layer in such a way that: 1. Performance is still good 2. Under load, we don't balloon resource usage at the netty layer I could be looking at something that isn't the real issue but it is notable. was (Author: apurtell): On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, *because SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", default 10), but now netty may be able to queue up a lot more, in comparison*. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. It may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:13 PM: -- On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, *because SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", default 10), but now netty may be able to queue up a lot more, in comparison*. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. It may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead of unbounded? maxPendingTasks probably should not be INT_MAX, but that may matter less. The goal would be to limit concurrency at the netty layer in such a way that: 1. Performance is still good 2. Under load, we don't balloon resource usage at the netty layer I could be looking at something that isn't the real issue but it is notable. was (Author: apurtell): On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, *because SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", default 10) and was not async, but now netty is able to queue up a lot of work asynchronously*. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. It may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:12 PM: -- On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, *because SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", default 10) and was not async, but now netty is able to queue up a lot of work asynchronously*. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. It may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead of unbounded? maxPendingTasks probably should not be INT_MAX, but that may matter less. The goal would be to limit concurrency at the netty layer in such a way that: 1. Performance is still good 2. Under load, we don't balloon resource usage at the netty layer I could be looking at something that isn't the real issue but it is notable. was (Author: apurtell): On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, *because SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", default 10) and was not async, but now netty is able to queue up a lot of work asynchronously*. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. It may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:09 PM: -- On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, *because SimpleRpcServer had thread limits (hbase.ipc.server.read.threadpool.size", default 10) and was not async, but now netty is able to queue up a lot of work asynchronously*. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. And this may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead of unbounded? maxPendingTasks probably should not be INT_MAX, but that may matter less. The goal would be to limit concurrency at the netty layer in such a way that: 1. Performance is still good 2. Under load, we don't balloon resource usage at the netty layer was (Author: apurtell): On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, but now netty is able to queue up a lot of work asynchronously. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. And this may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:09 PM: -- On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, *because SimpleRpcServer had thread limits ("hbase.ipc.server.read.threadpool.size", default 10) and was not async, but now netty is able to queue up a lot of work asynchronously*. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. It may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead of unbounded? maxPendingTasks probably should not be INT_MAX, but that may matter less. The goal would be to limit concurrency at the netty layer in such a way that: 1. Performance is still good 2. Under load, we don't balloon resource usage at the netty layer was (Author: apurtell): On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, *because SimpleRpcServer had thread limits (hbase.ipc.server.read.threadpool.size", default 10) and was not async, but now netty is able to queue up a lot of work asynchronously*. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. And this may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:06 PM: -- On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, but now netty is able to queue up a lot of work asynchronously. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. And this may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead of unbounded? maxPendingTasks probably should not be INT_MAX, but that may matter less. The goal would be to limit concurrency at the netty layer in such a way that: 1. Performance is still good 2. Under load, we don't balloon resource usage at the netty layer was (Author: apurtell): On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, but now netty is able to queue up a lot of work asynchronously. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. And this may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead of unbounded? maxPendingTasks should not be INT_MAX, that's not a sane default. > Netty "leak detected" and
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:04 PM: -- On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, but now netty is able to queue up a lot of work asynchronously. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. And this may also explain why security makes it worse, because when security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond netty, and that takes additional time there, which would back things up at the netty layer more than if call handling would complete more quickly without encryption. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead of unbounded? maxPendingTasks should not be INT_MAX, that's not a sane default. was (Author: apurtell): On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, but now netty is able to queue up a lot of work asynchronously. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead of unbounded? maxPendingTasks should not be INT_MAX, that's not a sane default. > Netty "leak detected" and OutOfDirectMemoryError due to direct memory > buffering > --- > > Key: HBASE-26708 > URL: https://issues.apache.org/jira/browse/HBASE-26708 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 2.5.0, 2.4.6 >Reporter: Viraj Jasani >Priority: Critical > > Under constant data ingestion, using default Netty based RpcServer and >
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:01 PM: -- On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, but now netty is able to queue up a lot of work asynchronously. This is going to be somewhat application dependent too. If the application interacts synchronously with calls and has its own bound, then in flight requests or their network level handling will be bounded by the aggregate (client_limit x number_of_clients). If the application is highly async, write-mostly, or a load test client – which is typically write-mostly, async, and configured with large bounds :) – then this can explain the findings reported here. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead of unbounded? maxPendingTasks should not be INT_MAX, that's not a sane default. was (Author: apurtell): On the subject of configuration and NettyRpcServer, we leave netty level resource limits unbounded. The number of threads to use for the event loop is default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum handler pool size, with a default of 30, typically raised in production by the user. We constrain the depth of the request queue in multiple ways... limits on number of queued calls, limits on total size of calls data that can be queued (to avoid memory usage overrun, just like this case), CoDel conditioning of the call queues if it is enabled, and so on. Under load can we pile up a excess of pending request state, such as direct buffers containing request bytes, at the netty layer because of downstream resource limits? Those limits will act as a bottleneck, as intended, and before would have also applied backpressure through RPC too, but now netty is able to queue up a lot of work asynchronously. Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 (unbounded). I don't know what it can actually get up to in production, because we lack the metric, but there are diminishing returns when threads > cores so a reasonable default here could be Runtime.getRuntime().availableProcessors() instead of unbounded? maxPendingTasks should not be INT_MAX, that's not a sane default. > Netty "leak detected" and OutOfDirectMemoryError due to direct memory > buffering > --- > > Key: HBASE-26708 > URL: https://issues.apache.org/jira/browse/HBASE-26708 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 2.5.0, 2.4.6 >Reporter: Viraj Jasani >Priority: Critical > > Under constant data ingestion, using default Netty based RpcServer and > RpcClient implementation results in OutOfDirectMemoryError, supposedly caused > by leaks detected by Netty's LeakDetector. > {code:java} > 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - java:115) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > >
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552340#comment-17552340 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 5:30 PM: - [~bbeaudreault] I was/am confused by that because HBASE-2 is a child of HBASE-26553 which describes itself as "OAuth Bearer authentication mech plugin for SASL". Can you or someone clean this up so we can clearly see what is going on? Is it really a full TLS RPC stack? Because it looks to me like some TLS fiddling to get a token that then sets up the usual wrapped SASL connection, possibly why I am confused. That would not be native TLS support in the sense I mean and the sense that is really required, possibly why it has not gotten enough attention. Oh, the PR itself describes the work as "HBASE-2 Add native TLS encryption support to RPC server/client ". That is much different. Let's clean up the situation with HBASE-2 and HBASE-26553 and take the conversation there so as not to distract from this JIRA. was (Author: apurtell): [~bbeaudreault] I was/am confused by that because HBASE-2 is a child of HBASE-26553 which describes itself as "OAuth Bearer authentication mech plugin for SASL". Can you or someone clean this up so we can clearly see what is going on? Is it really a full TLS RPC stack? Because it looks to me like some TLS fiddling to get a token that then sets up the usual wrapped SASL connection, possibly why I am confused. That would not be native TLS support in the sense I mean and the sense that is really required, possibly why it has not gotten enough attention. > Netty "leak detected" and OutOfDirectMemoryError due to direct memory > buffering > --- > > Key: HBASE-26708 > URL: https://issues.apache.org/jira/browse/HBASE-26708 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 2.5.0, 2.4.6 >Reporter: Viraj Jasani >Priority: Critical > > Under constant data ingestion, using default Netty based RpcServer and > RpcClient implementation results in OutOfDirectMemoryError, supposedly caused > by leaks detected by Netty's LeakDetector. > {code:java} > 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - java:115) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - > apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) > >
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552340#comment-17552340 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 5:28 PM: - [~bbeaudreault] I was/am confused by that because HBASE-2 is a child of HBASE-26553 which describes itself as "OAuth Bearer authentication mech plugin for SASL". Can you or someone clean this up so we can clearly see what is going on? Is it really a full TLS RPC stack? Because it looks to me like some TLS fiddling to get a token that then sets up the usual wrapped SASL connection, possibly why I am confused. That would not be native TLS support in the sense I mean and the sense that is really required, possibly why it has not gotten enough attention. was (Author: apurtell): [~bbeaudreault] I was/am confused by that because HBASE-2 is a child of HBASE-26553 which describes itself as "OAuth Bearer authentication mech plugin for SASL". Can you or someone clean this up so we can clearly see what is going on? Is it really a full TLS RPC stack? Because it looks to me like some TLS fiddling to get a token that then sets up the usual wrapped SASL connection. It is not native TLS support in the sense I mean and the sense that is really required, which is TLS and only TLS end to end, possibly why it has not gotten enough attention. > Netty "leak detected" and OutOfDirectMemoryError due to direct memory > buffering > --- > > Key: HBASE-26708 > URL: https://issues.apache.org/jira/browse/HBASE-26708 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 2.5.0, 2.4.6 >Reporter: Viraj Jasani >Priority: Critical > > Under constant data ingestion, using default Netty based RpcServer and > RpcClient implementation results in OutOfDirectMemoryError, supposedly caused > by leaks detected by Netty's LeakDetector. > {code:java} > 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - java:115) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - > apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > >
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552322#comment-17552322 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 5:15 PM: - [~zhangduo] Our current requirements would be _auth-conf_ but Viraj may have been testing with {_}auth{_}, which was the previous setting. [~vjasani] I am curious if you apply my patch and set hbase.netty.rpcserver.allocator=unpooled if the direct memory allocation still gets up to > 50 GB. My guess is yes, that it is the concurrent demand for buffers at load driving the usage, and not excessive cache retention in the pooled allocator. Let's see if experimental results confirm the hypothesis. If it helps then I am wrong and pooling configuration tweaks – read on below – should be considered. If I am correct then we should investigate how to get direct IO buffers freed faster and/or limits or pacing applied to their allocation; using a custom allocator, possibly. Like [~zhangduo] mentioned we set up a certain number of buffers, depending, more when sasl is used. This should be tunable? People with large RAM servers/instances can tune it up? People with more memory constrained options can tune it down? Looking at our PooledByteBufAllocator in hbase-thirdparty it is clear an issue people may be facing is confusion about system property names. I can see in the sources, via my IDE, that the shader rewrote the string constants containing the property keys too. Various resources on the Internet will offer documentation and suggestions, but because we relocated Netty into thirdparty, the names have changed, and so naively following the advice on StackOverflow and other places will have no effect. Key here is recommendations when you want to prefer heap instead of direct memory. Let me list them in terms of relevancy for addressing this issue. Highly relevant: - io.netty.allocator.cacheTrimInterval -> org.apache.hbase.thirdparty.io.netty.allocator.cacheTrimInterval -- This is the number of threshold of allocations when cached entries will be freed up if not frequently used. Lowering it from the default of 8192 may reduce the overall amount of direct memory retained in steady state, because the evaluation will be performed more often, as often as you specify. - io.netty.noPreferDirect -> org.apache.hbase.thirdparty.io.netty.noPreferDirect -- This will prefer heap arena allocations regardless of PlatformDependent ideas on preference if set to 'true'. - io.netty.allocator.numDirectArenas -> org.apache.hbase.thirdparty.io.netty.allocator.numDirectArenas -- Various advice on the Internet suggests setting numDirectArenas=0 and noPreferDirect=true as the way to prefer heap based buffers. Less relevant: - io.netty.allocator.maxCachedBufferCapacity -> org.apache.hbase.thirdparty.io.netty.allocator.maxCachedBufferCapacity -- This is the sized based retention policy for buffers; individual buffers larger than this will not be cached. - io.netty.allocator.numHeapArenas -> org.apache.hbase.thirdparty.io.netty.allocator.numHeapArenas - io.netty.allocator.pageSize -> org.apache.hbase.thirdparty.io.netty.allocator.pageSize - io.netty.allocator.maxOrder -> org.apache.hbase.thirdparty.io.netty.allocator.maxOrder On [https://github.com/apache/hbase/pull/4505] I have a draft PR that allows the user to tweak the Netty bytebuf allocation policy. This may be a good idea to do in general. We may want to provide support for some of the above Netty tunables in HBase site configuration as well, as a way to eliminate confusion about them... Our documentation on it would describe the HBase site config property names. On a side note, we might spike on an alternative to SASL RPC that is a TLS based implementation instead. I know this has been discussed and even partially attempted, repeatedly, over our history but nonetheless the operational and performance issues with SASL remain. We were here once before on HBASE-17721. [~bbeaudreault] posted HBASE-26548 more recently. was (Author: apurtell): [~zhangduo] Our current requirements would be _auth-conf_ but Viraj may have been testing with {_}auth{_}, which was the previous setting. [~vjasani] I am curious if you apply my patch and set hbase.netty.rpcserver.allocator=unpooled if the direct memory allocation still gets up to > 50 GB. My guess is yes, that it is the concurrent demand for buffers at load driving the usage, and not excessive cache retention in the pooled allocator. Let's see if experimental results confirm the hypothesis. If it helps then I am wrong and pooling configuration tweaks – read on below – should be considered. If I am correct then we should investigate how to get direct IO buffers freed faster and/or limits or pacing applied to their allocation; using a custom allocator, possibly. Looking at our
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552322#comment-17552322 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 5:07 PM: - [~zhangduo] Our current requirements would be _auth-conf_ but Viraj may have been testing with {_}auth{_}, which was the previous setting. [~vjasani] I am curious if you apply my patch and set hbase.netty.rpcserver.allocator=unpooled if the direct memory allocation still gets up to > 50 GB. My guess is yes, that it is the concurrent demand for buffers at load driving the usage, and not excessive cache retention in the pooled allocator. Let's see if experimental results confirm the hypothesis. If it helps then I am wrong and pooling configuration tweaks – read on below – should be considered. If I am correct then we should investigate how to get direct IO buffers freed faster and/or limits or pacing applied to their allocation; using a custom allocator, possibly. Looking at our PooledByteBufAllocator in hbase-thirdparty it is clear an issue people may be facing is confusion about system property names. I can see in the sources, via my IDE, that the shader rewrote the string constants containing the property keys too. Various resources on the Internet will offer documentation and suggestions, but because we relocated Netty into thirdparty, the names have changed, and so naively following the advice on StackOverflow and other places will have no effect. Key here is recommendations when you want to prefer heap instead of direct memory. Let me list them in terms of relevancy for addressing this issue. Highly relevant: - io.netty.allocator.cacheTrimInterval -> org.apache.hbase.thirdparty.io.netty.allocator.cacheTrimInterval -- This is the number of threshold of allocations when cached entries will be freed up if not frequently used. Lowering it from the default of 8192 may reduce the overall amount of direct memory retained in steady state, because the evaluation will be performed more often, as often as you specify. - io.netty.noPreferDirect -> org.apache.hbase.thirdparty.io.netty.noPreferDirect -- This will prefer heap arena allocations regardless of PlatformDependent ideas on preference if set to 'true'. - io.netty.allocator.numDirectArenas -> org.apache.hbase.thirdparty.io.netty.allocator.numDirectArenas -- Various advice on the Internet suggests setting numDirectArenas=0 and noPreferDirect=true as the way to prefer heap based buffers. Less relevant: - io.netty.allocator.maxCachedBufferCapacity -> org.apache.hbase.thirdparty.io.netty.allocator.maxCachedBufferCapacity -- This is the sized based retention policy for buffers; individual buffers larger than this will not be cached. - io.netty.allocator.numHeapArenas -> org.apache.hbase.thirdparty.io.netty.allocator.numHeapArenas - io.netty.allocator.pageSize -> org.apache.hbase.thirdparty.io.netty.allocator.pageSize - io.netty.allocator.maxOrder -> org.apache.hbase.thirdparty.io.netty.allocator.maxOrder On [https://github.com/apache/hbase/pull/4505] I have a draft PR that allows the user to tweak the Netty bytebuf allocation policy. This may be a good idea to do in general. We may want to provide support for some of the above Netty tunables in HBase site configuration as well, as a way to eliminate confusion about them... Our documentation on it would describe the HBase site config property names. On a side note, we might spike on an alternative to SASL RPC that is a TLS based implementation instead. I know this has been discussed and even partially attempted, repeatedly, over our history but nonetheless the operational and performance issues with SASL remain. We were here once before on HBASE-17721. [~bbeaudreault] posted HBASE-26548 more recently. was (Author: apurtell): [~zhangduo] Our current requirements would be _auth-conf_ but Viraj may have been testing with {_}auth{_}, which was the previous setting. [~vjasani] I am curious if you apply my patch and set hbase.netty.rpcserver.allocator=unpooled if the direct memory allocation still gets up to > 50 GB. My guess is yes, that it is the concurrent demand for buffers at load driving the usage, and not excessive cache retention in the pooled allocator. Let's see if experimental results confirm the hypothesis. If it helps then I am wrong and pooling configuration tweaks – read on below – should be considered. If I am correct then we should investigate how to get direct IO buffers freed faster and/or limits or pacing applied to their allocation; using a custom allocator, possibly. Looking at our PooledByteBufAllocator in hbase-thirdparty it is clear an issue people may be facing is confusion about system property names. I can see in the sources, via my IDE, that the shader rewrote the string constants containing the property keys too. Various
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552322#comment-17552322 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 4:54 PM: - [~zhangduo] Our current requirements would be _auth-conf_ but Viraj may have been testing with {_}auth{_}, which was the previous setting. [~vjasani] I am curious if you apply my patch and set hbase.netty.rpcserver.allocator=unpooled if the direct memory allocation still gets up to > 50 GB. My guess is yes, that it is the concurrent demand for buffers at load driving the usage, and not excessive cache retention in the pooled allocator. Let's see if experimental results confirm the hypothesis. If it helps then I am wrong and pooling configuration tweaks – read on below – should be considered. If I am correct then we should investigate how to get direct IO buffers freed faster and/or limits or pacing applied to their allocation; using a custom allocator, possibly. Looking at our PooledByteBufAllocator in hbase-thirdparty it is clear an issue people may be facing is confusion about system property names. I can see in the sources, via my IDE, that the shader rewrote the string constants containing the property keys too. Various resources on the Internet will offer documentation and suggestions, but because we relocated Netty into thirdparty, the names have changed, and so naively following the advice on StackOverflow and other places will have no effect. Key here is recommendations when you want to prefer heap instead of direct memory. Let me list them in terms of relevancy for addressing this issue. Highly relevant: - io.netty.allocator.cacheTrimInterval -> org.apache.hbase.thirdparty.io.netty.allocator.cacheTrimInterval -- This is the number of threshold of allocations when cached entries will be freed up if not frequently used. Lowering it from the default of 8192 may reduce the overall amount of direct memory retained in steady state, because the evaluation will be performed more often, as often as you specify. - io.netty.noPreferDirect -> org.apache.hbase.thirdparty.io.netty.noPreferDirect -- This will prefer heap arena allocations regardless of PlatformDependent ideas on preference if set to 'true'. - io.netty.allocator.numDirectArenas -> org.apache.hbase.thirdparty.io.netty.allocator.numDirectArenas -- Various advice on the Internet suggests setting numDirectArenas=0 and noPreferDirect=true as the way to prefer heap based buffers. Less relevant: - io.netty.allocator.maxCachedBufferCapacity -> org.apache.hbase.thirdparty.io.netty.allocator.maxCachedBufferCapacity -- This is the sized based retention policy for buffers; individual buffers larger than this will not be cached. - io.netty.allocator.numHeapArenas -> org.apache.hbase.thirdparty.io.netty.allocator.numHeapArenas - io.netty.allocator.pageSize -> org.apache.hbase.thirdparty.io.netty.allocator.pageSize - io.netty.allocator.maxOrder -> org.apache.hbase.thirdparty.io.netty.allocator.maxOrder On [https://github.com/apache/hbase/pull/4505] I have a draft PR that allows the user to tweak the Netty bytebuf allocation policy. This may be a good idea to do in general. We may want to provide support for some of the above Netty tunables in HBase site configuration as well, as a way to eliminate confusion about them... Our documentation on it would describe the HBase site config property names. On a side note, we might spike on an alternative to SASL RPC that is a TLS based implementation instead. I know this has been discussed and even partially attempted, repeatedly, over our history but nonetheless the operational and performance issues with SASL remain. was (Author: apurtell): [~zhangduo] Our current requirements would be _auth-conf_ but Viraj may have been testing with _auth_, which was the previous setting. [~vjasani] I am curious if you apply my patch and set hbase.netty.rpcserver.allocator=unpooled if the direct memory allocation still gets up to > 50 GB. My guess is yes, that it is the concurrent demand for buffers at load driving the usage, and not excessive cache retention in the pooled allocator. Let's see if experimental results confirm the hypothesis. If it helps then I am wrong and pooling configuration tweaks -- read on below -- should be considered. If I am correct then we should investigate how to get direct IO buffers freed faster and/or limits or pacing applied to their allocation; using a custom allocator, possibly. Looking at our PooledByteBufAllocator in hbase-thirdparty it is clear an issue people may be facing is confusion about system property names. Various resources on the Internet will offer documentation and suggestions, but because we relocated Netty into thirdparty, the names have changed, and so naively following the advice on StackOverflow and other places
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551700#comment-17551700 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/8/22 4:47 PM: - [~zhangduo] bq. Does increase MaxDirectMemorySize can solve the problem? Yes, this avoids the failures, but this remains a cost-to-serve problem as it requires upselection of e.g. AWS instance type for the larger RAM allocation. But it is an effective workaround for us. I get your point... We should update this issue, because this maybe isn't a _leak_. It is an excessive buffer retention issue, certainly. was (Author: apurtell): [~zhangduo] bq. Does increase MaxDirectMemorySize can solve the problem? Yes, this avoids the failures, but this remains a cost-to-serve problem as it requires upselection of e.g. AWS instance type for the larger RAM allocation. But it is an effective workaround, for sure. I get your point... We should update this issue, because this maybe isn't a _leak_. It is an excessive buffer retention issue. > Netty "leak detected" and OutOfDirectMemoryError due to direct memory > buffering > --- > > Key: HBASE-26708 > URL: https://issues.apache.org/jira/browse/HBASE-26708 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 2.5.0, 2.4.6 >Reporter: Viraj Jasani >Priority: Major > > Under constant data ingestion, using default Netty based RpcServer and > RpcClient implementation results in OutOfDirectMemoryError, supposedly caused > by leaks detected by Netty's LeakDetector. > {code:java} > 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - java:115) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - > apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > >
[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering
[ https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551700#comment-17551700 ] Andrew Kyle Purtell edited comment on HBASE-26708 at 6/8/22 4:46 PM: - [~zhangduo] bq. Does increase MaxDirectMemorySize can solve the problem? Yes, this avoids the failures, but this remains a cost-to-serve problem as it requires upselection of e.g. AWS instance type for the larger RAM allocation. But it is an effective workaround, for sure. I get your point... We should update this issue, because this maybe isn't a _leak_. It is an excessive buffer retention issue. was (Author: apurtell): [~zhangduo] bq. Does increase MaxDirectMemorySize can solve the problem? Yes, this avoids the failures, but this remains a cost-to-serve problem as it requires upselection of e.g. AWS instance type for the larger RAM allocation. But it is an effective workaround, for sure. I get your point... We should update this issue, because this isn't a _leak_. It is an excessive (IMHO) buffer retention issue. > Netty "leak detected" and OutOfDirectMemoryError due to direct memory > buffering > --- > > Key: HBASE-26708 > URL: https://issues.apache.org/jira/browse/HBASE-26708 > Project: HBase > Issue Type: Bug > Components: rpc >Affects Versions: 2.5.0, 2.4.6 >Reporter: Viraj Jasani >Priority: Major > > Under constant data ingestion, using default Netty based RpcServer and > RpcClient implementation results in OutOfDirectMemoryError, supposedly caused > by leaks detected by Netty's LeakDetector. > {code:java} > 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - java:115) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) > > org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) > > org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) > > org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > java.lang.Thread.run(Thread.java:748) > {code} > {code:java} > 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] > util.ResourceLeakDetector - > apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446) > > org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) > >