Valentyn Tymofieiev created BEAM-10200:
------------------------------------------

             Summary: Improve memory profiling for users of Portable Beam Python
                 Key: BEAM-10200
                 URL: https://issues.apache.org/jira/browse/BEAM-10200
             Project: Beam
          Issue Type: Bug
          Components: sdk-py-harness
            Reporter: Valentyn Tymofieiev


We have a Profiler[1] that is integrated with SDK worker[1a], however it only 
saves CPU metrics [1b].
We have a MemoryReporter util[2] which can log heap dumps, however it is not 
documented on Beam Website and does not respect the --profile_memory and 
--profile_location options[3]. The profile_memory flag currently works only for 
 Dataflow Runner users who run non-portable batch pipelines;  profiles are 
saved only if memory usage between samples exceeds 1000G. 

We should improve memory profiling experience for Portable Python users and 
consider making a guide on how users can investigate OOMing pipelines on Beam 
website.
 
[1] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L46
[1a] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L157
[1b] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L112
[2] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L124
[3] 
https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/options/pipeline_options.py#L846



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to