Just wanted to bump this up again. Sayak Paul | sayak.dev
On Tue, Sep 14, 2021 at 7:41 AM Sayak Paul <[email protected]> wrote: > Thank you! It worked. > > > you should be able to restructure the pipeline package, so that the > imports are not in the main module, similar to > https://stackoverflow.com/a/58845832/5153670 > > Could you expand a bit more on what you mean by "restructure"? Let me > provide more context on how my project is structured. > > This is the high-level structure: > > |- my_package/ > |- __init__.py > |- utils/ > |- __init__.py > |- html.py > |- scripts/ > |- dataflow_runner.py > |- setup.py > > dataflow_runner.py is the script that contains my Apache Beam pipeline > inside main() and I execute dataflow_runner.py to start my pipeline run > on Dataflow. html is the module that does not get identified when the > pipeline is run. This is how the module is imported inside dataflow_runner > : > > sys.path.append("..") > from my_package.utils html > > I also make sure to specify the path of the setup.py like so ../setup.py > from dataflow_runner. > > Please let me know if anything is unclear. > > Sayak Paul | sayak.dev > > > > On Tue, Sep 14, 2021 at 12:00 AM Valentyn Tymofieiev <[email protected]> > wrote: > >> Hi, >> Try to set --save_main_session=True >> when you launch the pipeline. If that works, you should be able to >> restructure the pipeline package, so that the imports are not in the main >> module, similar to https://stackoverflow.com/a/58845832/5153670 >> >> >> On Mon, Sep 13, 2021 at 2:45 AM Sayak Paul <[email protected]> wrote: >> >>> Hi folks, >>> >>> Have you ever faced an issue with local and global dependencies inside >>> an Apache Beam Pipeline while executing it on Dataflow? >>> >>> My pipeline involves a few components from the other modules of the >>> project and I have set up a setup.py following the guidelines from [1]. >>> What is surprising to me is that four out of the five modules are working >>> as expected and for only one, Beam is complaining that it's not defined. >>> Note that it only happens when I run it using the DataflowRunner. >>> >>> The error goes away when I include the module import inside the method >>> that starts my Beam pipeline. This is a hacky workaround IMO. More so >>> because I am not sure why the other module imports are working then. >>> >>> Could anyone provide some hints? >>> >>> [1] >>> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/ >>> >>> Sayak Paul | sayak.dev >>> >>>
