Also, take note that these counters will only be available if Beam has been
compiled with Cython ( e.g. installed from a wheel). Of course if you care
about performance you'd want that anyway.

On Wed, Jul 24, 2019, 5:15 PM Robert Bradshaw <[email protected]> wrote:

> Beam tracks the amount of time spent in each transform in profile
> counters. There is ongoing work to expose these in a uniform way for all
> runners (e.g. in Dataflow they're displayed on the UI), but for the direct
> runner you can see an example at
> https://github.com/apache/beam/blob/release-2.14.0/sdks/python/apache_beam/runners/portability/fn_api_runner_test.py#L1046
>  .
> For a raw dump you could do something like:
>
>     p = beam.Pipeline(...)
>     p | beam.Read...
>     results = p.run()
>     results.wait_until_finish()
>     import pprint
>     pprint.pprint(results._metrics_by_stage)
>
>
>
>
> On Wed, Jul 24, 2019 at 4:07 PM Yu Watanabe <[email protected]> wrote:
>
>> Hello .
>>
>> I have a pipeline built on  apache beam 2.13.0 using python 3.7.3.
>> My pipeline lasts about 5 hours to ingest 2 sets of approximately 70000
>> Json objects using Direct Runner.
>>
>> I want to diagnose which transforms are taking time and  improve code for
>> better performance. I saw below module for profiling but it seems it does
>> not report about speed of each transform.
>>
>>
>> https://beam.apache.org/releases/pydoc/2.13.0/apache_beam.utils.profiler.html
>>
>> Is there any module that you could use to monitor speed of each transform
>> ? If not, I appreciate if I could get some help for how to monitor speed
>> for each transform.
>>
>> Best Regards,
>> Yu Watanabe
>>
>> --
>> Yu Watanabe
>> Weekend Freelancer who loves to challenge building data platform
>> [email protected]
>> [image: LinkedIn icon] <https://www.linkedin.com/in/yuwatanabe1>  [image:
>> Twitter icon] <https://twitter.com/yuwtennis>
>>
>

Reply via email to