Thank you! It worked.
> you should be able to restructure the pipeline package, so that the
imports are not in the main module, similar to
https://stackoverflow.com/a/58845832/5153670
Could you expand a bit more on what you mean by "restructure"? Let me
provide more context on how my project is structured.
This is the high-level structure:
|- my_package/
|- __init__.py
|- utils/
|- __init__.py
|- html.py
|- scripts/
|- dataflow_runner.py
|- setup.py
dataflow_runner.py is the script that contains my Apache Beam pipeline
inside main() and I execute dataflow_runner.py to start my pipeline run on
Dataflow. html is the module that does not get identified when the pipeline
is run. This is how the module is imported inside dataflow_runner:
sys.path.append("..")
from my_package.utils html
I also make sure to specify the path of the setup.py like so ../setup.py
from dataflow_runner.
Please let me know if anything is unclear.
Sayak Paul | sayak.dev
On Tue, Sep 14, 2021 at 12:00 AM Valentyn Tymofieiev <[email protected]>
wrote:
> Hi,
> Try to set --save_main_session=True
> when you launch the pipeline. If that works, you should be able to
> restructure the pipeline package, so that the imports are not in the main
> module, similar to https://stackoverflow.com/a/58845832/5153670
>
>
> On Mon, Sep 13, 2021 at 2:45 AM Sayak Paul <[email protected]> wrote:
>
>> Hi folks,
>>
>> Have you ever faced an issue with local and global dependencies inside an
>> Apache Beam Pipeline while executing it on Dataflow?
>>
>> My pipeline involves a few components from the other modules of the
>> project and I have set up a setup.py following the guidelines from [1].
>> What is surprising to me is that four out of the five modules are working
>> as expected and for only one, Beam is complaining that it's not defined.
>> Note that it only happens when I run it using the DataflowRunner.
>>
>> The error goes away when I include the module import inside the method
>> that starts my Beam pipeline. This is a hacky workaround IMO. More so
>> because I am not sure why the other module imports are working then.
>>
>> Could anyone provide some hints?
>>
>> [1]
>> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
>>
>> Sayak Paul | sayak.dev
>>
>>