subject:"\[jira\] \[Comment Edited\] \(HBASE\-26708\) Netty \"leak detected\" and OutOfDirectMemoryError due to direct memory buffering"

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation

2022-07-06 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563499#comment-17563499
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 7/7/22 1:09 AM:
-

bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.

SimpleRcpServer is currently used as a fallback by Cloudera customers (and I 
presume others) with 2.2 when the Netty implementation has issues. I would also 
want it as a fallback option for our production. Anyway this is the kind of 
major operational change which should have a deprecation before removal. I know 
it is not the default but still represents an important fallback option. We 
should be accommodating to users here. Deprecation can be done now, that seems 
ok. Removal can be done in 3.0.  So it should be fixed first, removed later. 
Can land SimpleRpcServer specific things on HBASE-27097 after this issue is 
done.


was (Author: apurtell):
bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.

SimpleRcpServer is currently used as a fallback by Cloudera customers (and I 
presume others) with 2.2 when the Netty implementation has issues. I would also 
want it as a fallback option for our production. Anyway this is the kind of 
major operational change which should have a deprecation before removal. I know 
it is not the default but still represents an important fallback option. We 
should be accommodating to users here. Deprecation can be done now, that seems 
ok. Removal can be done in 3.0.  -So it should be fixed first, removed later. 
Can land SimpleRpcServer specific things on HBASE-27097 after this issue is 
done.-

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering with SASL implementation
> 
>
> Key: HBASE-26708
> URL: https://issues.apache.org/jira/browse/HBASE-26708
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.5.0, 2.4.6
>Reporter: Viraj Jasani
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> Under constant data ingestion, using default Netty based RpcServer and 
> RpcClient implementation results in OutOfDirectMemoryError, supposedly caused 
> by leaks detected by Netty's LeakDetector.
> {code:java}
> 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - java:115)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   java.lang.Thread.run(Thread.java:748)
>  {code}
> {code:java}
> 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] 
>

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation

2022-07-06 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563499#comment-17563499
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 7/7/22 1:08 AM:
-

bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.

SimpleRcpServer is currently used as a fallback by Cloudera customers (and I 
presume others) with 2.2 when the Netty implementation has issues. I would also 
want it as a fallback option for our production. Anyway this is the kind of 
major operational change which should have a deprecation before removal. I know 
it is not the default but still represents an important fallback option. We 
should be accommodating to users here. Deprecation can be done now, that seems 
ok. Removal can be done in 3.0.  -So it should be fixed first, removed later. 
Can land SimpleRpcServer specific things on HBASE-27097 after this issue is 
done.-


was (Author: apurtell):
bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.

SimpleRcpServer is currently used as a fallback by Cloudera customers (and I 
presume others) with 2.2 when the Netty implementation has issues. I would also 
want it as a fallback option for our production. Anyway this is the kind of 
major operational change which should have a deprecation before removal. I know 
it is not the default but still represents an important fallback option. We 
should be accommodating to users here. Deprecation can be done now, that seems 
ok. Removal can be done in 3.0.  So it should be fixed first, removed later. 
Can land SimpleRpcServer specific things on HBASE-27097 after this issue is 
done.

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering with SASL implementation
> 
>
> Key: HBASE-26708
> URL: https://issues.apache.org/jira/browse/HBASE-26708
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.5.0, 2.4.6
>Reporter: Viraj Jasani
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> Under constant data ingestion, using default Netty based RpcServer and 
> RpcClient implementation results in OutOfDirectMemoryError, supposedly caused 
> by leaks detected by Netty's LeakDetector.
> {code:java}
> 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - java:115)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   java.lang.Thread.run(Thread.java:748)
>  {code}
> {code:java}
> 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] 
>

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation

2022-07-06 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563499#comment-17563499
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 7/6/22 11:53 PM:
--

bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.

SimpleRcpServer is currently used as a fallback by Cloudera customers (and I 
presume others) with 2.2 when the Netty implementation has issues. I would also 
want it as a fallback option for our production. Anyway this is the kind of 
major operational change which should have a deprecation before removal. I know 
it is not the default but still represents an important fallback option. We 
should be accommodating to users here. Deprecation can be done now, that seems 
ok. Removal can be done in 3.0.  So it should be fixed first, removed later. 
Can land SimpleRpcServer specific things on HBASE-27097 after this issue is 
done.


was (Author: apurtell):
bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.

SimpleRcpServer is currently used as a fallback by Cloudera customers (and I 
presume others) with 2.2 when the Netty implementation has issues. I would also 
want it as a fallback option for our production. Anyway this is the kind of 
major operational change which should have a deprecation before removal. I know 
it is not the default but still represents an important fallback option. We 
should be accommodating to users here. Deprecation can be done now, that seems 
ok. Removal can be done in 3.0. 

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering with SASL implementation
> 
>
> Key: HBASE-26708
> URL: https://issues.apache.org/jira/browse/HBASE-26708
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.5.0, 2.4.6
>Reporter: Viraj Jasani
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> Under constant data ingestion, using default Netty based RpcServer and 
> RpcClient implementation results in OutOfDirectMemoryError, supposedly caused 
> by leaks detected by Netty's LeakDetector.
> {code:java}
> 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - java:115)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   java.lang.Thread.run(Thread.java:748)
>  {code}
> {code:java}
> 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - 
>

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation

2022-07-06 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563499#comment-17563499
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 7/6/22 11:43 PM:
--

bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.

SimpleRcpServer is currently used as a fallback by Cloudera customers (and I 
presume others) with 2.2 when the Netty implementation has issues. I would also 
want it as a fallback option for our production. Anyway this is the kind of 
major operational change which should have a deprecation before removal. I know 
it is not the default but still represents an important fallback option. We 
should be accommodating to users here. Deprecation can be done now, that seems 
ok. Removal can be done in 3.0. 


was (Author: apurtell):
bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.

SimpleRcpServer is currently used as a fallback by Cloudera customers (and I 
presume others) with 2.2 when the Netty implementation has issues. I would also 
want it as a fallback option for our production. Anyway this is the kind of 
major operational change which should have a deprecation before removal. 
Deprecation can be done now, that seems ok. Removal can be done in 3.0. 

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering with SASL implementation
> 
>
> Key: HBASE-26708
> URL: https://issues.apache.org/jira/browse/HBASE-26708
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.5.0, 2.4.6
>Reporter: Viraj Jasani
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> Under constant data ingestion, using default Netty based RpcServer and 
> RpcClient implementation results in OutOfDirectMemoryError, supposedly caused 
> by leaks detected by Netty's LeakDetector.
> {code:java}
> 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - java:115)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   java.lang.Thread.run(Thread.java:748)
>  {code}
> {code:java}
> 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - 
> apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446)
>   
>

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation

2022-07-06 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563499#comment-17563499
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 7/6/22 11:41 PM:
--

bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.

SimpleRcpServer is currently used as a fallback by Cloudera customers (and I 
presume others) with 2.2 when the Netty implementation has issues. I would also 
want it as a fallback option for our production. Anyway this is the kind of 
major operational change which should have a deprecation before removal. 
Deprecation can be done now, that seems ok. Removal can be done in 3.0. 


was (Author: apurtell):
bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.


This will not be possible in 2.x. We need a fix. SimpleRcpServer is currently 
used as a fallback by Cloudera customers (and I presume others) with 2.2 when 
the Netty implementation has issues. I would also want it as a fallback option 
for our production. Anyway this is the kind of major operational change which 
should have a deprecation before removal. Deprecation can be done now, that 
seems ok. Removal can be done in 3.0. 

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering with SASL implementation
> 
>
> Key: HBASE-26708
> URL: https://issues.apache.org/jira/browse/HBASE-26708
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.5.0, 2.4.6
>Reporter: Viraj Jasani
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> Under constant data ingestion, using default Netty based RpcServer and 
> RpcClient implementation results in OutOfDirectMemoryError, supposedly caused 
> by leaks detected by Netty's LeakDetector.
> {code:java}
> 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - java:115)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   java.lang.Thread.run(Thread.java:748)
>  {code}
> {code:java}
> 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - 
> apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   
>

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation

2022-07-06 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563499#comment-17563499
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 7/6/22 11:41 PM:
--

bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.


This will not be possible in 2.x. We need a fix. SimpleRcpServer is currently 
used as a fallback by Cloudera customers (and I presume others) with 2.2 when 
the Netty implementation has issues. I would also want it as a fallback option 
for our production. Anyway this is the kind of major operational change which 
should have a deprecation before removal. Deprecation can be done now, that 
seems ok. Removal can be done in 3.0. 


was (Author: apurtell):
bq. In general, I prefer we just remove the SimpleRpcServer implementation and 
rewrite the decode and encode part with netty, to make the code more clear.


This will not be possible in 2.x. We need a fix. SimpleRcpServer is currently 
used as a fallback by Cloudera customers (and I presume others) with 2.2 when 
the Netty implementation has issues, and anyway this is the kind of major 
operational change which should have a deprecation before removal. Deprecation 
can be done now, that seems ok. Removal can be done in 3.0. 

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering with SASL implementation
> 
>
> Key: HBASE-26708
> URL: https://issues.apache.org/jira/browse/HBASE-26708
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.5.0, 2.4.6
>Reporter: Viraj Jasani
>Assignee: Duo Zhang
>Priority: Blocker
> Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> Under constant data ingestion, using default Netty based RpcServer and 
> RpcClient implementation results in OutOfDirectMemoryError, supposedly caused 
> by leaks detected by Netty's LeakDetector.
> {code:java}
> 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - java:115)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   java.lang.Thread.run(Thread.java:748)
>  {code}
> {code:java}
> 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - 
> apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   
>

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)

[
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464
]

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:15 PM:
--

On the subject of configuration and NettyRpcServer, we leave netty level
resource limits unbounded. The number of threads to use for the event loop is
default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is
INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum
handler pool size, with a default of 30, typically raised in production by the
user. We constrain the depth of the request queue in multiple ways... limits on
number of queued calls, limits on total size of calls data that can be queued
(to avoid memory usage overrun, just like this case), CoDel conditioning of the
call queues if it is enabled, and so on.

This is going to be somewhat application dependent too. If the application
interacts synchronously with calls and has its own bound, then in flight
requests or their network level handling will be bounded by the aggregate
(client_in_flight_max x number_of_clients). If the application is highly async,
write-mostly, or a load test client – which is typically write-mostly, async,
and configured with large bounds :) – then this can explain the findings
reported here. It may also explain why security makes it worse, because when
security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer,
beyond netty, and that takes additional time there, which would back things up
at the netty layer more than if call handling would complete more quickly
without encryption.

Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0
(unbounded). I don't know what it can actually get up to in production, because
we lack the metric, but there are diminishing returns when threads > cores so a
reasonable default here could be Runtime.getRuntime().availableProcessors()
instead of unbounded?

maxPendingTasks probably should not be INT_MAX, but that may matter less.

The goal would be to limit concurrency at the netty layer in such a way that:
1. Performance is still good
2. Under load, we don't balloon resource usage at the netty layer

I could be looking at something that isn't the real issue but it is notable.

was (Author: apurtell):
On the subject of configuration and NettyRpcServer, we leave netty level
resource limits unbounded. The number of threads to use for the event loop is
default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is
INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum
handler pool size, with a default of 30, typically raised in production by the
user. We constrain the depth of the request queue in multiple ways... limits on
number of queued calls, limits on total size of calls data that can be queued
(to avoid memory usage overrun, just like this case), CoDel conditioning of the
call queues if it is enabled, and so on.

This is going to be somewhat application dependent too. If the application
interacts synchronously with calls and has its own bound, then in flight
requests or their network level handling will be bounded by the aggregate
(client_limit x number_of_clients). If the application is highly async,
write-mostly, or a load test client – which is typically write-mostly, async,
and configured with large bounds :) – then this can explain the findings
reported here. It may also explain why security makes it worse, because when
security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer,
beyond netty, and that takes additional time there, which would back things up
at the netty layer more than if call handling would complete more quickly
without encryption.

Consider the hbase.netty.eventloop.rpcserver.thread.count default.

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)

[
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464
]

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:14 PM:
--

This is going to be somewhat application dependent too. If the application
interacts synchronously with calls and has its own bound, then in flight
requests or their network level handling will be bounded by the aggregate
(client_limit x number_of_clients). If the application is highly async,
write-mostly, or a load test client – which is typically write-mostly, async,
and configured with large bounds :) – then this can explain the findings
reported here. It may also explain why security makes it worse, because when
security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer,
beyond netty, and that takes additional time there, which would back things up
at the netty layer more than if call handling would complete more quickly
without encryption.

maxPendingTasks probably should not be INT_MAX, but that may matter less.

The goal would be to limit concurrency at the netty layer in such a way that:
1. Performance is still good
2. Under load, we don't balloon resource usage at the netty layer

I could be looking at something that isn't the real issue but it is notable.

This is going to be somewhat application dependent too. If the application
interacts synchronously with calls and has its own bound, then in flight
requests or their network level handling will be bounded by the aggregate
(client_limit x number_of_clients). If the application is highly async,
write-mostly, or a load test client – which is typically write-mostly, async,
and configured with large bounds :) – then this can explain the findings
reported here. It may also explain why security makes it worse, because when
security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer,
beyond netty, and that takes additional time there, which would back things up
at the netty layer more than if call handling would complete more quickly
without encryption.

Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0
(unbounded). I don't know what it can actually

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)

[
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464
]

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:13 PM:
--

This is going to be somewhat application dependent too. If the application
interacts synchronously with calls and has its own bound, then in flight
requests or their network level handling will be bounded by the aggregate
(client_limit x number_of_clients). If the application is highly async,
write-mostly, or a load test client – which is typically write-mostly, async,
and configured with large bounds :) – then this can explain the findings
reported here. It may also explain why security makes it worse, because when
security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer,
beyond netty, and that takes additional time there, which would back things up
at the netty layer more than if call handling would complete more quickly
without encryption.

maxPendingTasks probably should not be INT_MAX, but that may matter less.

The goal would be to limit concurrency at the netty layer in such a way that:
1. Performance is still good
2. Under load, we don't balloon resource usage at the netty layer

I could be looking at something that isn't the real issue but it is notable.

This is going to be somewhat application dependent too. If the application
interacts synchronously with calls and has its own bound, then in flight
requests or their network level handling will be bounded by the aggregate
(client_limit x number_of_clients). If the application is highly async,
write-mostly, or a load test client – which is typically write-mostly, async,
and configured with large bounds :) – then this can explain the findings
reported here. It may also explain why security makes it worse, because when
security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer,
beyond netty, and that takes additional time there, which would back things up
at the netty layer more than if call handling would complete more quickly
without encryption.

Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0
(unbounded). I don't know what it can actually get up to in production, because

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)

[
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464
]

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:12 PM:
--

This is going to be somewhat application dependent too. If the application
interacts synchronously with calls and has its own bound, then in flight
requests or their network level handling will be bounded by the aggregate
(client_limit x number_of_clients). If the application is highly async,
write-mostly, or a load test client – which is typically write-mostly, async,
and configured with large bounds :) – then this can explain the findings
reported here. It may also explain why security makes it worse, because when
security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer,
beyond netty, and that takes additional time there, which would back things up
at the netty layer more than if call handling would complete more quickly
without encryption.

maxPendingTasks probably should not be INT_MAX, but that may matter less.

The goal would be to limit concurrency at the netty layer in such a way that:
1. Performance is still good
2. Under load, we don't balloon resource usage at the netty layer

I could be looking at something that isn't the real issue but it is notable.

This is going to be somewhat application dependent too. If the application
interacts synchronously with calls and has its own bound, then in flight
requests or their network level handling will be bounded by the aggregate
(client_limit x number_of_clients). If the application is highly async,
write-mostly, or a load test client – which is typically write-mostly, async,
and configured with large bounds :) – then this can explain the findings
reported here. It may also explain why security makes it worse, because when
security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer,
beyond netty, and that takes additional time there, which would back things up
at the netty layer more than if call handling would complete more quickly
without encryption.

Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0
(unbounded). I don't know what it can actually get up to in

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)

[
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464
]

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:09 PM:
--

Under load can we pile up a excess of pending request state, such as direct
buffers containing request bytes, at the netty layer because of downstream
resource limits? Those limits will act as a bottleneck, as intended, and before
would have also applied backpressure through RPC too, *because SimpleRpcServer
had thread limits (hbase.ipc.server.read.threadpool.size", default 10) and was
not async, but now netty is able to queue up a lot of work asynchronously*.
This is going to be somewhat application dependent too. If the application
interacts synchronously with calls and has its own bound, then in flight
requests or their network level handling will be bounded by the aggregate
(client_limit x number_of_clients). If the application is highly async,
write-mostly, or a load test client – which is typically write-mostly, async,
and configured with large bounds :) – then this can explain the findings
reported here.

And this may also explain why security makes it worse, because when security is
active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond
netty, and that takes additional time there, which would back things up at the
netty layer more than if call handling would complete more quickly without
encryption.

maxPendingTasks probably should not be INT_MAX, but that may matter less.

The goal would be to limit concurrency at the netty layer in such a way that:
1. Performance is still good
2. Under load, we don't balloon resource usage at the netty layer

Under load can we pile up a excess of pending request state, such as direct
buffers containing request bytes, at the netty layer because of downstream
resource limits? Those limits will act as a bottleneck, as intended, and before
would have also applied backpressure through RPC too, but now netty is able to
queue up a lot of work asynchronously. This is going to be somewhat application
dependent too. If the application interacts synchronously with calls and has
its own bound, then in flight requests or their network level handling will be
bounded by the aggregate (client_limit x number_of_clients). If the application
is highly async, write-mostly, or a load test client – which is typically
write-mostly, async, and configured with large bounds :) – then this can
explain the findings reported here.

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)

[
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464
]

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:09 PM:
--

This is going to be somewhat application dependent too. If the application
interacts synchronously with calls and has its own bound, then in flight
requests or their network level handling will be bounded by the aggregate
(client_limit x number_of_clients). If the application is highly async,
write-mostly, or a load test client – which is typically write-mostly, async,
and configured with large bounds :) – then this can explain the findings
reported here. It may also explain why security makes it worse, because when
security is active we wrap (encrypt) and unwrap (decrypt) up in the call layer,
beyond netty, and that takes additional time there, which would back things up
at the netty layer more than if call handling would complete more quickly
without encryption.

maxPendingTasks probably should not be INT_MAX, but that may matter less.

The goal would be to limit concurrency at the netty layer in such a way that:
1. Performance is still good
2. Under load, we don't balloon resource usage at the netty layer

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)

[
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464
]

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:06 PM:
--

Under load can we pile up a excess of pending request state, such as direct
buffers containing request bytes, at the netty layer because of downstream
resource limits? Those limits will act as a bottleneck, as intended, and before
would have also applied backpressure through RPC too, but now netty is able to
queue up a lot of work asynchronously. This is going to be somewhat application
dependent too. If the application interacts synchronously with calls and has
its own bound, then in flight requests or their network level handling will be
bounded by the aggregate (client_limit x number_of_clients). If the application
is highly async, write-mostly, or a load test client – which is typically
write-mostly, async, and configured with large bounds :) – then this can
explain the findings reported here.

maxPendingTasks probably should not be INT_MAX, but that may matter less.

The goal would be to limit concurrency at the netty layer in such a way that:
1. Performance is still good
2. Under load, we don't balloon resource usage at the netty layer

Under load can we pile up a excess of pending request state, such as direct
buffers containing request bytes, at the netty layer because of downstream
resource limits? Those limits will act as a bottleneck, as intended, and before
would have also applied backpressure through RPC too, but now netty is able to
queue up a lot of work asynchronously. This is going to be somewhat application
dependent too. If the application interacts synchronously with calls and has
its own bound, then in flight requests or their network level handling will be
bounded by the aggregate (client_limit x number_of_clients). If the application
is highly async, write-mostly, or a load test client – which is typically
write-mostly, async, and configured with large bounds :) – then this can
explain the findings reported here.

maxPendingTasks should not be INT_MAX, that's not a sane default.

> Netty "leak detected" and

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:04 PM:
--

On the subject of configuration and NettyRpcServer, we leave netty level 
resource limits unbounded. The number of threads to use for the event loop is 
default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is 
INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum 
handler pool size, with a default of 30, typically raised in production by the 
user. We constrain the depth of the request queue in multiple ways... limits on 
number of queued calls, limits on total size of calls data that can be queued 
(to avoid memory usage overrun, just like this case), CoDel conditioning of the 
call queues if it is enabled, and so on.

Under load can we pile up a excess of pending request state, such as direct 
buffers containing request bytes, at the netty layer because of downstream 
resource limits? Those limits will act as a bottleneck, as intended, and before 
would have also applied backpressure through RPC too, but now netty is able to 
queue up a lot of work asynchronously. This is going to be somewhat application 
dependent too. If the application interacts synchronously with calls and has 
its own bound, then in flight requests or their network level handling will be 
bounded by the aggregate (client_limit x number_of_clients). If the application 
is highly async, write-mostly, or a load test client – which is typically 
write-mostly, async, and configured with large bounds :) – then this can 
explain the findings reported here.

And this may also explain why security makes it worse, because when security is 
active we wrap (encrypt) and unwrap (decrypt) up in the call layer, beyond 
netty, and that takes additional time there, which would back things up at the 
netty layer more than if call handling would complete more quickly without 
encryption.

Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 
(unbounded). I don't know what it can actually get up to in production, because 
we lack the metric, but there are diminishing returns when threads > cores so a 
reasonable default here could be Runtime.getRuntime().availableProcessors() 
instead of unbounded?

maxPendingTasks should not be INT_MAX, that's not a sane default.


was (Author: apurtell):
On the subject of configuration and NettyRpcServer, we leave netty level 
resource limits unbounded. The number of threads to use for the event loop is 
default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is 
INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum 
handler pool size, with a default of 30, typically raised in production by the 
user. We constrain the depth of the request queue in multiple ways... limits on 
number of queued calls, limits on total size of calls data that can be queued 
(to avoid memory usage overrun, just like this case), CoDel conditioning of the 
call queues if it is enabled, and so on.

Under load can we pile up a excess of pending request state, such as direct 
buffers containing request bytes, at the netty layer because of downstream 
resource limits? Those limits will act as a bottleneck, as intended, and before 
would have also applied backpressure through RPC too, but now netty is able to 
queue up a lot of work asynchronously. This is going to be somewhat application 
dependent too. If the application interacts synchronously with calls and has 
its own bound, then in flight requests or their network level handling will be 
bounded by the aggregate (client_limit x number_of_clients). If the application 
is highly async, write-mostly, or a load test client – which is typically 
write-mostly, async, and configured with large bounds :) – then this can 
explain the findings reported here.

Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 
(unbounded). I don't know what it can actually get up to in production, because 
we lack the metric, but there are diminishing returns when threads > cores so a 
reasonable default here could be Runtime.getRuntime().availableProcessors() 
instead of unbounded?

maxPendingTasks should not be INT_MAX, that's not a sane default.

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering
> ---
>
> Key: HBASE-26708
> URL: https://issues.apache.org/jira/browse/HBASE-26708
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.5.0, 2.4.6
>Reporter: Viraj Jasani
>Priority: Critical
>
> Under constant data ingestion, using default Netty based RpcServer and 
>

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552464#comment-17552464
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 11:01 PM:
--

On the subject of configuration and NettyRpcServer, we leave netty level 
resource limits unbounded. The number of threads to use for the event loop is 
default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is 
INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum 
handler pool size, with a default of 30, typically raised in production by the 
user. We constrain the depth of the request queue in multiple ways... limits on 
number of queued calls, limits on total size of calls data that can be queued 
(to avoid memory usage overrun, just like this case), CoDel conditioning of the 
call queues if it is enabled, and so on.

Under load can we pile up a excess of pending request state, such as direct 
buffers containing request bytes, at the netty layer because of downstream 
resource limits? Those limits will act as a bottleneck, as intended, and before 
would have also applied backpressure through RPC too, but now netty is able to 
queue up a lot of work asynchronously. This is going to be somewhat application 
dependent too. If the application interacts synchronously with calls and has 
its own bound, then in flight requests or their network level handling will be 
bounded by the aggregate (client_limit x number_of_clients). If the application 
is highly async, write-mostly, or a load test client – which is typically 
write-mostly, async, and configured with large bounds :) – then this can 
explain the findings reported here.

Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 
(unbounded). I don't know what it can actually get up to in production, because 
we lack the metric, but there are diminishing returns when threads > cores so a 
reasonable default here could be Runtime.getRuntime().availableProcessors() 
instead of unbounded?

maxPendingTasks should not be INT_MAX, that's not a sane default.


was (Author: apurtell):
On the subject of configuration and NettyRpcServer, we leave netty level 
resource limits unbounded. The number of threads to use for the event loop is 
default 0 (unbounded). The default for io.netty.eventLoop.maxPendingTasks is 
INT_MAX. We don't do this for our own RPC handlers. We have a notion of maximum 
handler pool size, with a default of 30, typically raised in production by the 
user. We constrain the depth of the request queue in multiple ways... limits on 
number of queued calls, limits on total size of calls data that can be queued 
(to avoid memory usage overrun, just like this case), CoDel conditioning of the 
call queues if it is enabled, and so on.

Under load can we pile up a excess of pending request state, such as direct 
buffers containing request bytes, at the netty layer because of downstream 
resource limits? Those limits will act as a bottleneck, as intended, and before 
would have also applied backpressure through RPC too, but now netty is able to 
queue up a lot of work asynchronously. 

Consider the hbase.netty.eventloop.rpcserver.thread.count default. It is 0 
(unbounded). I don't know what it can actually get up to in production, because 
we lack the metric, but there are diminishing returns when threads > cores so a 
reasonable default here could be Runtime.getRuntime().availableProcessors() 
instead of unbounded?

maxPendingTasks should not be INT_MAX, that's not a sane default.

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering
> ---
>
> Key: HBASE-26708
> URL: https://issues.apache.org/jira/browse/HBASE-26708
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.5.0, 2.4.6
>Reporter: Viraj Jasani
>Priority: Critical
>
> Under constant data ingestion, using default Netty based RpcServer and 
> RpcClient implementation results in OutOfDirectMemoryError, supposedly caused 
> by leaks detected by Netty's LeakDetector.
> {code:java}
> 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - java:115)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
>

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552340#comment-17552340
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 5:30 PM:
-

[~bbeaudreault]  I was/am confused by that because HBASE-2 is a child of 
HBASE-26553 which describes itself as "OAuth Bearer authentication mech plugin 
for SASL". Can you or someone clean this up so we can clearly see what is going 
on? Is it really a full TLS RPC stack? Because it looks to me like some TLS 
fiddling to get a token that then sets up the usual wrapped SASL connection, 
possibly why I am confused. That would not be native TLS support in the sense I 
mean and the sense that is really required, possibly why it has not gotten 
enough attention. 

Oh, the PR itself describes the work as "HBASE-2 Add native TLS encryption 
support to RPC server/client ". That is much different. 

Let's clean up the situation with HBASE-2 and HBASE-26553 and take the 
conversation there so as not to distract from this JIRA.


was (Author: apurtell):
[~bbeaudreault]  I was/am confused by that because HBASE-2 is a child of 
HBASE-26553 which describes itself as "OAuth Bearer authentication mech plugin 
for SASL". Can you or someone clean this up so we can clearly see what is going 
on? Is it really a full TLS RPC stack? Because it looks to me like some TLS 
fiddling to get a token that then sets up the usual wrapped SASL connection, 
possibly why I am confused. That would not be native TLS support in the sense I 
mean and the sense that is really required, possibly why it has not gotten 
enough attention. 

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering
> ---
>
> Key: HBASE-26708
> URL: https://issues.apache.org/jira/browse/HBASE-26708
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.5.0, 2.4.6
>Reporter: Viraj Jasani
>Priority: Critical
>
> Under constant data ingestion, using default Netty based RpcServer and 
> RpcClient implementation results in OutOfDirectMemoryError, supposedly caused 
> by leaks detected by Netty's LeakDetector.
> {code:java}
> 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - java:115)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   java.lang.Thread.run(Thread.java:748)
>  {code}
> {code:java}
> 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - 
> apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446)
>   
>

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552340#comment-17552340
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 5:28 PM:
-

[~bbeaudreault]  I was/am confused by that because HBASE-2 is a child of 
HBASE-26553 which describes itself as "OAuth Bearer authentication mech plugin 
for SASL". Can you or someone clean this up so we can clearly see what is going 
on? Is it really a full TLS RPC stack? Because it looks to me like some TLS 
fiddling to get a token that then sets up the usual wrapped SASL connection, 
possibly why I am confused. That would not be native TLS support in the sense I 
mean and the sense that is really required, possibly why it has not gotten 
enough attention. 


was (Author: apurtell):
[~bbeaudreault]  I was/am confused by that because HBASE-2 is a child of 
HBASE-26553 which describes itself as "OAuth Bearer authentication mech plugin 
for SASL". Can you or someone clean this up so we can clearly see what is going 
on? Is it really a full TLS RPC stack? Because it looks to me like some TLS 
fiddling to get a token that then sets up the usual wrapped SASL connection. It 
is not native TLS support in the sense I mean and the sense that is really 
required, which is TLS and only TLS end to end, possibly why it has not gotten 
enough attention. 

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering
> ---
>
> Key: HBASE-26708
> URL: https://issues.apache.org/jira/browse/HBASE-26708
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.5.0, 2.4.6
>Reporter: Viraj Jasani
>Priority: Critical
>
> Under constant data ingestion, using default Netty based RpcServer and 
> RpcClient implementation results in OutOfDirectMemoryError, supposedly caused 
> by leaks detected by Netty's LeakDetector.
> {code:java}
> 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - java:115)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   java.lang.Thread.run(Thread.java:748)
>  {code}
> {code:java}
> 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - 
> apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
>

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)

[
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552322#comment-17552322
]

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 5:15 PM:
-

[~zhangduo] Our current requirements would be _auth-conf_ but Viraj may have
been testing with {_}auth{_}, which was the previous setting.

[~vjasani] I am curious if you apply my patch and set
hbase.netty.rpcserver.allocator=unpooled if the direct memory allocation still
gets up to > 50 GB. My guess is yes, that it is the concurrent demand for
buffers at load driving the usage, and not excessive cache retention in the
pooled allocator. Let's see if experimental results confirm the hypothesis. If
it helps then I am wrong and pooling configuration tweaks – read on below –
should be considered.

If I am correct then we should investigate how to get direct IO buffers freed
faster and/or limits or pacing applied to their allocation; using a custom
allocator, possibly. Like [~zhangduo] mentioned we set up a certain number of
buffers, depending, more when sasl is used. This should be tunable? People with
large RAM servers/instances can tune it up? People with more memory constrained
options can tune it down?

Looking at our PooledByteBufAllocator in hbase-thirdparty it is clear an issue
people may be facing is confusion about system property names. I can see in the
sources, via my IDE, that the shader rewrote the string constants containing
the property keys too. Various resources on the Internet will offer
documentation and suggestions, but because we relocated Netty into thirdparty,
the names have changed, and so naively following the advice on StackOverflow
and other places will have no effect. Key here is recommendations when you want
to prefer heap instead of direct memory.

Let me list them in terms of relevancy for addressing this issue.

Highly relevant:
- io.netty.allocator.cacheTrimInterval ->
org.apache.hbase.thirdparty.io.netty.allocator.cacheTrimInterval
-- This is the number of threshold of allocations when cached entries will be
freed up if not frequently used. Lowering it from the default of 8192 may
reduce the overall amount of direct memory retained in steady state, because
the evaluation will be performed more often, as often as you specify.
- io.netty.noPreferDirect ->
org.apache.hbase.thirdparty.io.netty.noPreferDirect
-- This will prefer heap arena allocations regardless of PlatformDependent
ideas on preference if set to 'true'.
- io.netty.allocator.numDirectArenas ->
org.apache.hbase.thirdparty.io.netty.allocator.numDirectArenas
-- Various advice on the Internet suggests setting numDirectArenas=0 and
noPreferDirect=true as the way to prefer heap based buffers.

Less relevant:
- io.netty.allocator.maxCachedBufferCapacity ->
org.apache.hbase.thirdparty.io.netty.allocator.maxCachedBufferCapacity
-- This is the sized based retention policy for buffers; individual buffers
larger than this will not be cached.
- io.netty.allocator.numHeapArenas ->
org.apache.hbase.thirdparty.io.netty.allocator.numHeapArenas
- io.netty.allocator.pageSize ->
org.apache.hbase.thirdparty.io.netty.allocator.pageSize
- io.netty.allocator.maxOrder ->
org.apache.hbase.thirdparty.io.netty.allocator.maxOrder

On [https://github.com/apache/hbase/pull/4505] I have a draft PR that allows
the user to tweak the Netty bytebuf allocation policy. This may be a good idea
to do in general. We may want to provide support for some of the above Netty
tunables in HBase site configuration as well, as a way to eliminate confusion
about them... Our documentation on it would describe the HBase site config
property names.

On a side note, we might spike on an alternative to SASL RPC that is a TLS
based implementation instead. I know this has been discussed and even partially
attempted, repeatedly, over our history but nonetheless the operational and
performance issues with SASL remain. We were here once before on HBASE-17721.
[~bbeaudreault] posted HBASE-26548 more recently.

was (Author: apurtell):
[~zhangduo] Our current requirements would be _auth-conf_ but Viraj may have
been testing with {_}auth{_}, which was the previous setting.

Looking at our

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)

[
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552322#comment-17552322
]

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 5:07 PM:
-

[~zhangduo] Our current requirements would be _auth-conf_ but Viraj may have
been testing with {_}auth{_}, which was the previous setting.

Let me list them in terms of relevancy for addressing this issue.

was (Author: apurtell):
[~zhangduo] Our current requirements would be _auth-conf_ but Viraj may have
been testing with {_}auth{_}, which was the previous setting.

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-09 Thread Andrew Kyle Purtell (Jira)

[
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552322#comment-17552322
]

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/9/22 4:54 PM:
-

[~zhangduo] Our current requirements would be _auth-conf_ but Viraj may have
been testing with {_}auth{_}, which was the previous setting.

Let me list them in terms of relevancy for addressing this issue.

was (Author: apurtell):
[~zhangduo] Our current requirements would be _auth-conf_ but Viraj may have
been testing with _auth_, which was the previous setting.

Looking at our PooledByteBufAllocator in hbase-thirdparty it is clear an issue
people may be facing is confusion about system property names. Various
resources on the Internet will offer documentation and suggestions, but because
we relocated Netty into thirdparty, the names have changed, and so naively
following the advice on StackOverflow and other places

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-08 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551700#comment-17551700
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/8/22 4:47 PM:
-

[~zhangduo] 

bq. Does increase MaxDirectMemorySize can solve the problem?

Yes, this avoids the failures, but this remains a cost-to-serve problem as it 
requires upselection of e.g. AWS instance type for the larger RAM allocation. 
But it is an effective workaround for us.

I get your point... We should update this issue, because this maybe isn't a 
_leak_. It is an excessive buffer retention issue, certainly.


was (Author: apurtell):
[~zhangduo] 

bq. Does increase MaxDirectMemorySize can solve the problem?

Yes, this avoids the failures, but this remains a cost-to-serve problem as it 
requires upselection of e.g. AWS instance type for the larger RAM allocation. 
But it is an effective workaround, for sure.

I get your point... We should update this issue, because this maybe isn't a 
_leak_. It is an excessive buffer retention issue.

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering
> ---
>
> Key: HBASE-26708
> URL: https://issues.apache.org/jira/browse/HBASE-26708
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.5.0, 2.4.6
>Reporter: Viraj Jasani
>Priority: Major
>
> Under constant data ingestion, using default Netty based RpcServer and 
> RpcClient implementation results in OutOfDirectMemoryError, supposedly caused 
> by leaks detected by Netty's LeakDetector.
> {code:java}
> 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - java:115)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   java.lang.Thread.run(Thread.java:748)
>  {code}
> {code:java}
> 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - 
> apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
>

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

2022-06-08 Thread Andrew Kyle Purtell (Jira)



[ 
https://issues.apache.org/jira/browse/HBASE-26708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17551700#comment-17551700
 ] 

Andrew Kyle Purtell edited comment on HBASE-26708 at 6/8/22 4:46 PM:
-

[~zhangduo] 

bq. Does increase MaxDirectMemorySize can solve the problem?

Yes, this avoids the failures, but this remains a cost-to-serve problem as it 
requires upselection of e.g. AWS instance type for the larger RAM allocation. 
But it is an effective workaround, for sure.

I get your point... We should update this issue, because this maybe isn't a 
_leak_. It is an excessive buffer retention issue.


was (Author: apurtell):
[~zhangduo] 

bq. Does increase MaxDirectMemorySize can solve the problem?

Yes, this avoids the failures, but this remains a cost-to-serve problem as it 
requires upselection of e.g. AWS instance type for the larger RAM allocation. 
But it is an effective workaround, for sure.

I get your point... We should update this issue, because this isn't a _leak_. 
It is an excessive (IMHO) buffer retention issue.

> Netty "leak detected" and OutOfDirectMemoryError due to direct memory 
> buffering
> ---
>
> Key: HBASE-26708
> URL: https://issues.apache.org/jira/browse/HBASE-26708
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.5.0, 2.4.6
>Reporter: Viraj Jasani
>Priority: Major
>
> Under constant data ingestion, using default Netty based RpcServer and 
> RpcClient implementation results in OutOfDirectMemoryError, supposedly caused 
> by leaks detected by Netty's LeakDetector.
> {code:java}
> 2022-01-25 17:03:10,084 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - java:115)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:538)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:274)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
>   
> org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
>   
> org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
>   
> org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   java.lang.Thread.run(Thread.java:748)
>  {code}
> {code:java}
> 2022-01-25 17:03:14,014 ERROR [S-EventLoopGroup-1-3] 
> util.ResourceLeakDetector - 
> apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:507)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:446)
>   
> org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
>   
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
>   
>

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering with SASL implementation

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

[jira] [Comment Edited] (HBASE-26708) Netty "leak detected" and OutOfDirectMemoryError due to direct memory buffering

22 matches

Site Navigation

Mail list logo

Footer information