RE: Direct buffer memory in job with hbase client

2021-12-17 Thread Anton
Looks like I set wrong parameter. Is should have been 
taskmanager.memory.task.off-heap.size.

 

From: Anton [mailto:anton...@yandex.ru] 
Sent: Friday, December 17, 2021 10:12 PM
To: 'Xintong Song' 
Cc: 'user' 
Subject: RE: Direct buffer memory in job with hbase client

 

Hi Xintong,

 

After recent job failure I’ve set taskmanager.memory.task.heap.size to 128m, 
but the cluster was unable to start with next output:

 

Starting cluster.

Starting standalonesession daemon on host ***.

Password:

[ERROR] The execution result is empty.

[ERROR] Could not get JVM parameters and dynamic configurations properly.

[ERROR] Raw output from BashJavaUtils:

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will 
impact performance.

INFO  [] - Loading configuration property: jobmanager.rpc.address, ***

INFO  [] - Loading configuration property: jobmanager.rpc.port, 6123

INFO  [] - Loading configuration property: jobmanager.memory.process.size, 
16m

INFO  [] - Loading configuration property: taskmanager.memory.process.size, 
172800m

INFO  [] - Loading configuration property: taskmanager.numberOfTaskSlots, 31

INFO  [] - Loading configuration property: parallelism.default, 1

INFO  [] - Loading configuration property: 
jobmanager.execution.failover-strategy, region

INFO  [] - Loading configuration property: taskmanager.memory.task.heap.size, 
128m

INFO  [] - The derived from fraction jvm overhead memory (16.875gb (18119393550 
bytes)) is greater than its max value 1024.000mb (1073741824 bytes), max value 
will be used instead

Exception in thread "main" 
org.apache.flink.configuration.IllegalConfigurationException: TaskManager 
memory configuration failed: If Total Flink, Task Heap and (or) Managed Memory 
sizes are explicitly configured then the Network Memory size is the rest of the 
Total Flink memory after subtracting all other configured types of memory, but 
the derived Network Memory is inconsistent with its configuration.

at 
org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:163)

at 
org.apache.flink.runtime.util.bash.BashJavaUtils.getTmResourceParams(BashJavaUtils.java:85)

at 
org.apache.flink.runtime.util.bash.BashJavaUtils.runCommand(BashJavaUtils.java:67)

at 
org.apache.flink.runtime.util.bash.BashJavaUtils.main(BashJavaUtils.java:56)

Caused by: org.apache.flink.configuration.IllegalConfigurationException: If 
Total Flink, Task Heap and (or) Managed Memory sizes are explicitly configured 
then the Network Memory size is the rest of the Total Flink memory after 
subtracting all other configured types of memory, but the derived Network 
Memory is inconsistent with its configuration.

at 
org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.sanityCheckNetworkMemoryWithExplicitlySetTotalFlinkAndHeapMemory(TaskExecutorFlinkMemoryUtils.java:344)

at 
org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.deriveFromTotalFlinkMemory(TaskExecutorFlinkMemoryUtils.java:147)

at 
org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.deriveFromTotalFlinkMemory(TaskExecutorFlinkMemoryUtils.java:42)

at 
org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.deriveProcessSpecWithTotalProcessMemory(ProcessMemoryUtils.java:119)

at 
org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:84)

at 
org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:160)

... 3 more

Caused by: org.apache.flink.configuration.IllegalConfigurationException: 
Derived Network Memory size (100.125gb (107508399056 bytes)) is not in 
configured Network Memory range [64.000mb (67108864 bytes), 1024.000mb 
(1073741824 bytes)].

at 
org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.sanityCheckNetworkMemory(TaskExecutorFlinkMemoryUtils.java:378)

at 
org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.sanityCheckNetworkMemoryWithExplicitlySetTotalFlinkAndHeapMemory(TaskExecutorFlinkMemoryUtils.java:342)

... 8 more

 

How to choose memory parameters properly?

 

 

From: Xintong Song [mailto:tonysong...@gmail.com] 
Sent: Wednesday, December 15, 2021 12:17 PM
To: Anton mailto:anton...@yandex.ru> >
Cc: user mailto:user@flink.apache.org> >
Subject: Re: Direct buffer memory in job with hbase client

 

Hi Anton,

 

You may want to try increasing the task off-heap memory, as your tasks are 
using hbase client which needs off-heap (direct) memory. The default task 
off-heap memory is 0 because most tasks do not use off-heap memory.

 

Unfortunately, I cannot advise on how much task off-heap memory your job needs, 
which pr

RE: Direct buffer memory in job with hbase client

2021-12-17 Thread Anton
Hi Xintong,

 

After recent job failure I’ve set taskmanager.memory.task.heap.size to 128m, 
but the cluster was unable to start with next output:

 

Starting cluster.

Starting standalonesession daemon on host ***.

Password:

[ERROR] The execution result is empty.

[ERROR] Could not get JVM parameters and dynamic configurations properly.

[ERROR] Raw output from BashJavaUtils:

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will 
impact performance.

INFO  [] - Loading configuration property: jobmanager.rpc.address, ***

INFO  [] - Loading configuration property: jobmanager.rpc.port, 6123

INFO  [] - Loading configuration property: jobmanager.memory.process.size, 
16m

INFO  [] - Loading configuration property: taskmanager.memory.process.size, 
172800m

INFO  [] - Loading configuration property: taskmanager.numberOfTaskSlots, 31

INFO  [] - Loading configuration property: parallelism.default, 1

INFO  [] - Loading configuration property: 
jobmanager.execution.failover-strategy, region

INFO  [] - Loading configuration property: taskmanager.memory.task.heap.size, 
128m

INFO  [] - The derived from fraction jvm overhead memory (16.875gb (18119393550 
bytes)) is greater than its max value 1024.000mb (1073741824 bytes), max value 
will be used instead

Exception in thread "main" 
org.apache.flink.configuration.IllegalConfigurationException: TaskManager 
memory configuration failed: If Total Flink, Task Heap and (or) Managed Memory 
sizes are explicitly configured then the Network Memory size is the rest of the 
Total Flink memory after subtracting all other configured types of memory, but 
the derived Network Memory is inconsistent with its configuration.

at 
org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:163)

at 
org.apache.flink.runtime.util.bash.BashJavaUtils.getTmResourceParams(BashJavaUtils.java:85)

at 
org.apache.flink.runtime.util.bash.BashJavaUtils.runCommand(BashJavaUtils.java:67)

at 
org.apache.flink.runtime.util.bash.BashJavaUtils.main(BashJavaUtils.java:56)

Caused by: org.apache.flink.configuration.IllegalConfigurationException: If 
Total Flink, Task Heap and (or) Managed Memory sizes are explicitly configured 
then the Network Memory size is the rest of the Total Flink memory after 
subtracting all other configured types of memory, but the derived Network 
Memory is inconsistent with its configuration.

at 
org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.sanityCheckNetworkMemoryWithExplicitlySetTotalFlinkAndHeapMemory(TaskExecutorFlinkMemoryUtils.java:344)

at 
org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.deriveFromTotalFlinkMemory(TaskExecutorFlinkMemoryUtils.java:147)

at 
org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.deriveFromTotalFlinkMemory(TaskExecutorFlinkMemoryUtils.java:42)

at 
org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.deriveProcessSpecWithTotalProcessMemory(ProcessMemoryUtils.java:119)

at 
org.apache.flink.runtime.util.config.memory.ProcessMemoryUtils.memoryProcessSpecFromConfig(ProcessMemoryUtils.java:84)

at 
org.apache.flink.runtime.clusterframework.TaskExecutorProcessUtils.processSpecFromConfig(TaskExecutorProcessUtils.java:160)

... 3 more

Caused by: org.apache.flink.configuration.IllegalConfigurationException: 
Derived Network Memory size (100.125gb (107508399056 bytes)) is not in 
configured Network Memory range [64.000mb (67108864 bytes), 1024.000mb 
(1073741824 bytes)].

at 
org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.sanityCheckNetworkMemory(TaskExecutorFlinkMemoryUtils.java:378)

at 
org.apache.flink.runtime.util.config.memory.taskmanager.TaskExecutorFlinkMemoryUtils.sanityCheckNetworkMemoryWithExplicitlySetTotalFlinkAndHeapMemory(TaskExecutorFlinkMemoryUtils.java:342)

... 8 more

 

How to choose memory parameters properly?

 

 

From: Xintong Song [mailto:tonysong...@gmail.com] 
Sent: Wednesday, December 15, 2021 12:17 PM
To: Anton 
Cc: user 
Subject: Re: Direct buffer memory in job with hbase client

 

Hi Anton,

 

You may want to try increasing the task off-heap memory, as your tasks are 
using hbase client which needs off-heap (direct) memory. The default task 
off-heap memory is 0 because most tasks do not use off-heap memory.

 

Unfortunately, I cannot advise on how much task off-heap memory your job needs, 
which probably depends on your hbase client configurations.




Thank you~

Xintong Song

 

 

On Wed, Dec 15, 2021 at 1:40 PM Anton mailto:anton...@yandex.ru> > wrote:

Hi, from time to time my job is stopping to process messages with warn message 
listed below. Tried to increase jobmanager.memory.process.size and 
taskmanager.memory.process.si

Re: Direct buffer memory in job with hbase client

2021-12-15 Thread Xintong Song
Hi Anton,

You may want to try increasing the task off-heap memory, as your tasks are
using hbase client which needs off-heap (direct) memory. The default task
off-heap memory is 0 because most tasks do not use off-heap memory.

Unfortunately, I cannot advise on how much task off-heap memory your job
needs, which probably depends on your hbase client configurations.

Thank you~

Xintong Song



On Wed, Dec 15, 2021 at 1:40 PM Anton  wrote:

> Hi, from time to time my job is stopping to process messages with warn
> message listed below. Tried to increase jobmanager.memory.process.size and
> taskmanager.memory.process.size but it didn’t help.
>
> What else can I try? “Framework Off-heap” is 128mb now as seen is task
> manager dashboard and Task Off-heap is 0b. Documentation says that “You
> should only change this value if you are sure that the Flink framework
> needs more memory.” And I’m not sure about it.
>
> Flink version is 1.13.2.
>
>
>
> 2021-11-29 14:06:53,659 WARN
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline [] - An
> exceptionCaught() event was fired, and it reached at the tail of the
> pipeline. It usually means the last handler in the pipeline did not handle
> the exception.
>
> org.apache.hbase.thirdparty.io.netty.channel.ChannelPipelineException:
> org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler.handlerAdded()
> has thrown an exception; removed.
>
> at
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:624)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.addFirst(DefaultChannelPipeline.java:181)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.addFirst(DefaultChannelPipeline.java:358)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.addFirst(DefaultChannelPipeline.java:339)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hadoop.hbase.ipc.NettyRpcConnection.saslNegotiate(NettyRpcConnection.java:215)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$600(NettyRpcConnection.java:76)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hadoop.hbase.ipc.NettyRpcConnection$2.operationComplete(NettyRpcConnection.java:289)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hadoop.hbase.ipc.NettyRpcConnection$2.operationComplete(NettyRpcConnection.java:277)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:605)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
> [blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261caa9da2:?]
>
> at
> org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:300)
> 

Direct buffer memory in job with hbase client

2021-12-14 Thread Anton
Hi, from time to time my job is stopping to process messages with warn
message listed below. Tried to increase jobmanager.memory.process.size and
taskmanager.memory.process.size but it didn't help.

What else can I try? "Framework Off-heap" is 128mb now as seen is task
manager dashboard and Task Off-heap is 0b. Documentation says that "You
should only change this value if you are sure that the Flink framework needs
more memory." And I'm not sure about it.

Flink version is 1.13.2.

 

2021-11-29 14:06:53,659 WARN
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline [] - An
exceptionCaught() event was fired, and it reached at the tail of the
pipeline. It usually means the last handler in the pipeline did not handle
the exception.

org.apache.hbase.thirdparty.io.netty.channel.ChannelPipelineException:
org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler.handlerAdded
() has thrown an exception; removed.

at
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.callHand
lerAdded0(DefaultChannelPipeline.java:624)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.addFirst
(DefaultChannelPipeline.java:181)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.addFirst
(DefaultChannelPipeline.java:358)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.addFirst
(DefaultChannelPipeline.java:339)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hadoop.hbase.ipc.NettyRpcConnection.saslNegotiate(NettyRpcConnect
ion.java:215)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$600(NettyRpcConnection
.java:76)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hadoop.hbase.ipc.NettyRpcConnection$2.operationComplete(NettyRpcC
onnection.java:289)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hadoop.hbase.ipc.NettyRpcConnection$2.operationComplete(NettyRpcC
onnection.java:277)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyLi
stener0(DefaultPromise.java:578)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyLi
steners0(DefaultPromise.java:571)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyLi
stenersNow(DefaultPromise.java:550)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyLi
steners(DefaultPromise.java:491)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setValue
0(DefaultPromise.java:616)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.setSucce
ss0(DefaultPromise.java:605)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.trySucce
ss(DefaultPromise.java:104)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPromise.trySucces
s(DefaultChannelPromise.java:84)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$Abstract
NioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:300)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$Abstract
NioUnsafe.finishConnect(AbstractNioChannel.java:335)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelecte
dKey(NioEventLoop.java:707)
[blob_p-6eb282e9e614ab47d8c0b446632a1a9cba8a3955-6e6e09bc9b5fae2679cbbb261ca
a9da2:?]

at
org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelecte
dKeysOptimized(NioEventLoop.java:655)