Also, take note that these counters will only be available if Beam has been compiled with Cython ( e.g. installed from a wheel). Of course if you care about performance you'd want that anyway.
On Wed, Jul 24, 2019, 5:15 PM Robert Bradshaw <[email protected]> wrote: > Beam tracks the amount of time spent in each transform in profile > counters. There is ongoing work to expose these in a uniform way for all > runners (e.g. in Dataflow they're displayed on the UI), but for the direct > runner you can see an example at > https://github.com/apache/beam/blob/release-2.14.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L1046 > . > For a raw dump you could do something like: > > p = beam.Pipeline(...) > p | beam.Read... > results = p.run() > results.wait_until_finish() > import pprint > pprint.pprint(results._metrics_by_stage) > > > > > On Wed, Jul 24, 2019 at 4:07 PM Yu Watanabe <[email protected]> wrote: > >> Hello . >> >> I have a pipeline built on apache beam 2.13.0 using python 3.7.3. >> My pipeline lasts about 5 hours to ingest 2 sets of approximately 70000 >> Json objects using Direct Runner. >> >> I want to diagnose which transforms are taking time and improve code for >> better performance. I saw below module for profiling but it seems it does >> not report about speed of each transform. >> >> >> https://beam.apache.org/releases/pydoc/2.13.0/apache_beam.utils.profiler.html >> >> Is there any module that you could use to monitor speed of each transform >> ? If not, I appreciate if I could get some help for how to monitor speed >> for each transform. >> >> Best Regards, >> Yu Watanabe >> >> -- >> Yu Watanabe >> Weekend Freelancer who loves to challenge building data platform >> [email protected] >> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1> [image: >> Twitter icon] <https://twitter.com/yuwtennis> >> >
