Hi folks,

Have you ever faced an issue with local and global dependencies inside an
Apache Beam Pipeline while executing it on Dataflow?

My pipeline involves a few components from the other modules of the project
and I have set up a setup.py following the guidelines from [1]. What is
surprising to me is that four out of the five modules are working as
expected and for only one, Beam is complaining that it's not defined. Note
that it only happens when I run it using the DataflowRunner.

The error goes away when I include the module import inside the method that
starts my Beam pipeline. This is a hacky workaround IMO. More so because I
am not sure why the other module imports are working then.

Could anyone provide some hints?

[1] https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/

Sayak Paul | sayak.dev

Reply via email to