Hi Xintong,

Thanks for the info. Is there any way to access these metrics outside of the 
UI? I suppose Flink’s reporters might provide them, but will they also be 
available through the REST interface (or another interface)?

Regards,
Alexis.

From: Xintong Song <tonysong...@gmail.com>
Sent: Tuesday, 13 April 2021 14:30
To: Alexis Sarda-Espinosa <alexis.sarda-espin...@microfocus.com>
Cc: user@flink.apache.org
Subject: Re: Clarification about Flink's managed memory and metric monitoring

Hi Alexis,

First of all, I strongly recommend not to look into the JVM metrics. These 
metrics are fetched directly from JVM and do not well correspond to Flink's 
memory configurations. They were introduced a long time ago and are preserved 
mostly for compatibility. IMO, they bring more confusion than convenience. In 
Flink-1.12, there is a newly designed TM metrics page in the web ui, which 
clearly shows how the metrics correspond to Flink's memory configurations (if 
any).

Concerning your questions.
1. Yes, increasing framework/task off-heap memory sizes should increase the 
direct memory capacity. Increasing the network memory size should also do that.
2. When 'state.backend.rocksdb.memory.managed' is true, RocksDB uses managed 
memory. Managed memory is not measured by any JVM metrics. It's not managed by 
JVM, meaning that it's not limited by '-XX:MaxDirectMemorySize' and is not 
controlled by the garbage collectors.


Thank you~

Xintong Song


On Tue, Apr 13, 2021 at 7:53 PM Alexis Sarda-Espinosa 
<alexis.sarda-espin...@microfocus.com<mailto:alexis.sarda-espin...@microfocus.com>>
 wrote:
Hello,

I have a Flink TM configured with taskmanager.memory.managed.size: 1372m. There 
is a streaming job using RocksDB for checkpoints, so I assume some of this 
memory will indeed be used.

I was looking at the metrics exposed through the REST interface, and I queried 
some of them:

/taskmanagers/c3c960d79c1eb2341806bfa2b2d66328/metrics?get=Status.JVM.Memory.Heap.Committed,Status.JVM.Memory.NonHeap.Committed,Status.JVM.Memory.Direct.MemoryUsed
 | jq
[
  {
    "id": "Status.JVM.Memory.Heap.Committed",
    "value": "1652031488"
  },
  {
    "id": "Status.JVM.Memory.NonHeap.Committed",
    "value": "234291200"                                                        
     223 MiB
  },
  {
    "id": "Status.JVM.Memory.Direct.MemoryUsed",
    "value": "375015427"                                                        
    358 MiB
  },
  {
    "id": "Status.JVM.Memory.Direct.TotalCapacity",
    "value": "375063552"                                                        
    358 MiB
  }
]

I presume direct memory is being used by Flink and its networking stack, as 
well as by the JVM itself. To be sure:


  1.  Increasing "taskmanager.memory.framework.off-heap.size" or 
"taskmanager.memory.task.off-heap.size" should increase 
Status.JVM.Memory.Direct.TotalCapacity, right?
  2.  I presume the native memory used by RocksDB cannot be tracked with these 
JVM metrics even if "state.backend.rocksdb.memory.managed" is true, right?

Based on this question: 
https://stackoverflow.com/questions/30622818/off-heap-native-heap-direct-memory-and-native-memory,
 I imagine Flink/RocksDB either allocates memory completely independently of 
the JVM, or it uses unsafe. Since the documentation 
(https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup_tm.html#managed-memory)
 states that "Managed memory is managed by Flink and is allocated as native 
memory (off-heap)", I thought this native memory might show up as part of 
direct memory tracking, but I guess it doesn’t.

Regards,
Alexis.

Reply via email to