Hi,

I found our flink taskmanager is likely to crash due to the python harness
being no longer reachable.

However, it seems like the Beam harness is not the child process of flink
taskmanager process, and thus the flink metrics monitor is unable to report
the usage of the memory usage by the Beam SDK harness (either Java or
Python). Which makes me unable to further debug on the issue. Wondering if
there's a way to monitor the beam memory usage especially for those harness
processes?

Also, it is very likely that the disconnect could potentially result from
OOM. If that is the case, what is the best way to limit the resource usage
by the harness? I noticed there's a resource hint
<https://beam.apache.org/documentation/runtime/resource-hints/>, but it
also mentioned that not all runners will honor that setting, but I couldn't
find anything mentioning in flink runner related to the resource hint.
Wondering if that is the best way for us to fix the memory usage or is
there any other approach that we can do to avoid the OOM on python task
runs on flink runner? Thanks
Sincerely,
Lydian Lee

Reply via email to