Re: Clarification about Flink's managed memory and metric monitoring

2021-04-13 Thread Xintong Song
These metrics should also be available via REST.

You can check the original design doc [1] for which metrics the UI is using.

Thank you~

Xintong Song


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager

On Tue, Apr 13, 2021 at 9:08 PM Alexis Sarda-Espinosa <
alexis.sarda-espin...@microfocus.com> wrote:

> Hi Xintong,
>
>
>
> Thanks for the info. Is there any way to access these metrics outside of
> the UI? I suppose Flink’s reporters might provide them, but will they also
> be available through the REST interface (or another interface)?
>
>
>
> Regards,
>
> Alexis.
>
>
>
> *From:* Xintong Song 
> *Sent:* Tuesday, 13 April 2021 14:30
> *To:* Alexis Sarda-Espinosa 
> *Cc:* user@flink.apache.org
> *Subject:* Re: Clarification about Flink's managed memory and metric
> monitoring
>
>
>
> Hi Alexis,
>
>
>
> First of all, I strongly recommend not to look into the JVM metrics. These
> metrics are fetched directly from JVM and do not well correspond to Flink's
> memory configurations. They were introduced a long time ago and are
> preserved mostly for compatibility. IMO, they bring more confusion than
> convenience. In Flink-1.12, there is a newly designed TM metrics page in
> the web ui, which clearly shows how the metrics correspond to Flink's
> memory configurations (if any).
>
>
>
> Concerning your questions.
>
> 1. Yes, increasing framework/task off-heap memory sizes should increase
> the direct memory capacity. Increasing the network memory size should also
> do that.
>
> 2. When 'state.backend.rocksdb.memory.managed' is true, RocksDB uses
> managed memory. Managed memory is not measured by any JVM metrics. It's not
> managed by JVM, meaning that it's not limited by '-XX:MaxDirectMemorySize'
> and is not controlled by the garbage collectors.
>
>
> Thank you~
>
> Xintong Song
>
>
>
>
>
> On Tue, Apr 13, 2021 at 7:53 PM Alexis Sarda-Espinosa <
> alexis.sarda-espin...@microfocus.com> wrote:
>
> Hello,
>
>
>
> I have a Flink TM configured with taskmanager.memory.managed.size: 1372m.
> There is a streaming job using RocksDB for checkpoints, so I assume some of
> this memory will indeed be used.
>
>
>
> I was looking at the metrics exposed through the REST interface, and I
> queried some of them:
>
>
>
> /taskmanagers/c3c960d79c1eb2341806bfa2b2d66328/metrics?get=Status.JVM.Memory.Heap.Committed,Status.JVM.Memory.NonHeap.Committed,Status.JVM.Memory.Direct.MemoryUsed
> | jq
>
> [
>
>   {
>
> "id": "Status.JVM.Memory.Heap.Committed",
>
> "value": "1652031488"
>
>   },
>
>   {
>
> "id": "Status.JVM.Memory.NonHeap.Committed",
>
> "value": "234291200"
>  223 MiB
>
>   },
>
>   {
>
> "id": "Status.JVM.Memory.Direct.MemoryUsed",
>
> "value": "375015427"
> 358 MiB
>
>   },
>
>   {
>
> "id": "Status.JVM.Memory.Direct.TotalCapacity",
>
> "value": "375063552"
> 358 MiB
>
>   }
>
> ]
>
>
>
> I presume direct memory is being used by Flink and its networking stack,
> as well as by the JVM itself. To be sure:
>
>
>
>1. Increasing "taskmanager.memory.framework.off-heap.size" or
>"taskmanager.memory.task.off-heap.size" should increase
>Status.JVM.Memory.Direct.TotalCapacity, right?
>2. I presume the native memory used by RocksDB cannot be tracked with
>these JVM metrics even if "state.backend.rocksdb.memory.managed" is true,
>right?
>
>
>
> Based on this question:
> https://stackoverflow.com/questions/30622818/off-heap-native-heap-direct-memory-and-native-memory,
> I imagine Flink/RocksDB either allocates memory completely independently of
> the JVM, or it uses unsafe. Since the documentation (
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup_tm.html#managed-memory)
> states that "Managed memory is managed by Flink and is allocated as native
> memory (off-heap)", I thought this native memory might show up as part of
> direct memory tracking, but I guess it doesn’t.
>
>
>
> Regards,
>
> Alexis.
>
>
>
>


RE: Clarification about Flink's managed memory and metric monitoring

2021-04-13 Thread Alexis Sarda-Espinosa
Hi Xintong,

Thanks for the info. Is there any way to access these metrics outside of the 
UI? I suppose Flink’s reporters might provide them, but will they also be 
available through the REST interface (or another interface)?

Regards,
Alexis.

From: Xintong Song 
Sent: Tuesday, 13 April 2021 14:30
To: Alexis Sarda-Espinosa 
Cc: user@flink.apache.org
Subject: Re: Clarification about Flink's managed memory and metric monitoring

Hi Alexis,

First of all, I strongly recommend not to look into the JVM metrics. These 
metrics are fetched directly from JVM and do not well correspond to Flink's 
memory configurations. They were introduced a long time ago and are preserved 
mostly for compatibility. IMO, they bring more confusion than convenience. In 
Flink-1.12, there is a newly designed TM metrics page in the web ui, which 
clearly shows how the metrics correspond to Flink's memory configurations (if 
any).

Concerning your questions.
1. Yes, increasing framework/task off-heap memory sizes should increase the 
direct memory capacity. Increasing the network memory size should also do that.
2. When 'state.backend.rocksdb.memory.managed' is true, RocksDB uses managed 
memory. Managed memory is not measured by any JVM metrics. It's not managed by 
JVM, meaning that it's not limited by '-XX:MaxDirectMemorySize' and is not 
controlled by the garbage collectors.


Thank you~

Xintong Song


On Tue, Apr 13, 2021 at 7:53 PM Alexis Sarda-Espinosa 
mailto:alexis.sarda-espin...@microfocus.com>>
 wrote:
Hello,

I have a Flink TM configured with taskmanager.memory.managed.size: 1372m. There 
is a streaming job using RocksDB for checkpoints, so I assume some of this 
memory will indeed be used.

I was looking at the metrics exposed through the REST interface, and I queried 
some of them:

/taskmanagers/c3c960d79c1eb2341806bfa2b2d66328/metrics?get=Status.JVM.Memory.Heap.Committed,Status.JVM.Memory.NonHeap.Committed,Status.JVM.Memory.Direct.MemoryUsed
 | jq
[
  {
"id": "Status.JVM.Memory.Heap.Committed",
"value": "1652031488"
  },
  {
"id": "Status.JVM.Memory.NonHeap.Committed",
"value": "234291200"
 223 MiB
  },
  {
"id": "Status.JVM.Memory.Direct.MemoryUsed",
"value": "375015427"
358 MiB
  },
  {
"id": "Status.JVM.Memory.Direct.TotalCapacity",
"value": "375063552"
358 MiB
  }
]

I presume direct memory is being used by Flink and its networking stack, as 
well as by the JVM itself. To be sure:


  1.  Increasing "taskmanager.memory.framework.off-heap.size" or 
"taskmanager.memory.task.off-heap.size" should increase 
Status.JVM.Memory.Direct.TotalCapacity, right?
  2.  I presume the native memory used by RocksDB cannot be tracked with these 
JVM metrics even if "state.backend.rocksdb.memory.managed" is true, right?

Based on this question: 
https://stackoverflow.com/questions/30622818/off-heap-native-heap-direct-memory-and-native-memory,
 I imagine Flink/RocksDB either allocates memory completely independently of 
the JVM, or it uses unsafe. Since the documentation 
(https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup_tm.html#managed-memory)
 states that "Managed memory is managed by Flink and is allocated as native 
memory (off-heap)", I thought this native memory might show up as part of 
direct memory tracking, but I guess it doesn’t.

Regards,
Alexis.



Re: Clarification about Flink's managed memory and metric monitoring

2021-04-13 Thread Xintong Song
Hi Alexis,

First of all, I strongly recommend not to look into the JVM metrics. These
metrics are fetched directly from JVM and do not well correspond to Flink's
memory configurations. They were introduced a long time ago and are
preserved mostly for compatibility. IMO, they bring more confusion than
convenience. In Flink-1.12, there is a newly designed TM metrics page in
the web ui, which clearly shows how the metrics correspond to Flink's
memory configurations (if any).

Concerning your questions.
1. Yes, increasing framework/task off-heap memory sizes should increase the
direct memory capacity. Increasing the network memory size should also do
that.
2. When 'state.backend.rocksdb.memory.managed' is true, RocksDB uses
managed memory. Managed memory is not measured by any JVM metrics. It's not
managed by JVM, meaning that it's not limited by '-XX:MaxDirectMemorySize'
and is not controlled by the garbage collectors.

Thank you~

Xintong Song



On Tue, Apr 13, 2021 at 7:53 PM Alexis Sarda-Espinosa <
alexis.sarda-espin...@microfocus.com> wrote:

> Hello,
>
>
>
> I have a Flink TM configured with taskmanager.memory.managed.size: 1372m.
> There is a streaming job using RocksDB for checkpoints, so I assume some of
> this memory will indeed be used.
>
>
>
> I was looking at the metrics exposed through the REST interface, and I
> queried some of them:
>
>
>
> /taskmanagers/c3c960d79c1eb2341806bfa2b2d66328/metrics?get=Status.JVM.Memory.Heap.Committed,Status.JVM.Memory.NonHeap.Committed,Status.JVM.Memory.Direct.MemoryUsed
> | jq
>
> [
>
>   {
>
> "id": "Status.JVM.Memory.Heap.Committed",
>
> "value": "1652031488"
>
>   },
>
>   {
>
> "id": "Status.JVM.Memory.NonHeap.Committed",
>
> "value": "234291200"
>  223 MiB
>
>   },
>
>   {
>
> "id": "Status.JVM.Memory.Direct.MemoryUsed",
>
> "value": "375015427"
> 358 MiB
>
>   },
>
>   {
>
> "id": "Status.JVM.Memory.Direct.TotalCapacity",
>
> "value": "375063552"
> 358 MiB
>
>   }
>
> ]
>
>
>
> I presume direct memory is being used by Flink and its networking stack,
> as well as by the JVM itself. To be sure:
>
>
>
>1. Increasing "taskmanager.memory.framework.off-heap.size" or
>"taskmanager.memory.task.off-heap.size" should increase
>Status.JVM.Memory.Direct.TotalCapacity, right?
>2. I presume the native memory used by RocksDB cannot be tracked with
>these JVM metrics even if "state.backend.rocksdb.memory.managed" is true,
>right?
>
>
>
> Based on this question:
> https://stackoverflow.com/questions/30622818/off-heap-native-heap-direct-memory-and-native-memory,
> I imagine Flink/RocksDB either allocates memory completely independently of
> the JVM, or it uses unsafe. Since the documentation (
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup_tm.html#managed-memory)
> states that "Managed memory is managed by Flink and is allocated as native
> memory (off-heap)", I thought this native memory might show up as part of
> direct memory tracking, but I guess it doesn’t.
>
>
>
> Regards,
>
> Alexis.
>
>
>