Valentyn Tymofieiev created BEAM-10200: ------------------------------------------
Summary: Improve memory profiling for users of Portable Beam Python Key: BEAM-10200 URL: https://issues.apache.org/jira/browse/BEAM-10200 Project: Beam Issue Type: Bug Components: sdk-py-harness Reporter: Valentyn Tymofieiev We have a Profiler[1] that is integrated with SDK worker[1a], however it only saves CPU metrics [1b]. We have a MemoryReporter util[2] which can log heap dumps, however it is not documented on Beam Website and does not respect the --profile_memory and --profile_location options[3]. The profile_memory flag currently works only for Dataflow Runner users who run non-portable batch pipelines; profiles are saved only if memory usage between samples exceeds 1000G. We should improve memory profiling experience for Portable Python users and consider making a guide on how users can investigate OOMing pipelines on Beam website. [1] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L46 [1a] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L157 [1b] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L112 [2] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/utils/profiler.py#L124 [3] https://github.com/apache/beam/blob/095589c28f5c427bf99fc0330af91c859bb2ad6b/sdks/python/apache_beam/options/pipeline_options.py#L846 -- This message was sent by Atlassian Jira (v8.3.4#803005)