Just wanted to bump this up again.
Sayak Paul | sayak.dev


On Tue, Sep 14, 2021 at 7:41 AM Sayak Paul <[email protected]> wrote:

> Thank you! It worked.
>
> >  you should be able to restructure the pipeline package, so that the
> imports are not in the main module, similar to
> https://stackoverflow.com/a/58845832/5153670
>
> Could you expand a bit more on what you mean by "restructure"? Let me
> provide more context on how my project is structured.
>
> This is the high-level structure:
>
> |- my_package/
>     |- __init__.py
>     |- utils/
>         |- __init__.py
>         |- html.py
> |- scripts/
>     |- dataflow_runner.py
> |- setup.py
>
> dataflow_runner.py is the script that contains my Apache Beam pipeline
> inside main() and I execute dataflow_runner.py to start my pipeline run
> on Dataflow. html is the module that does not get identified when the
> pipeline is run. This is how the module is imported inside dataflow_runner
> :
>
> sys.path.append("..")
> from my_package.utils html
>
> I also make sure to specify the path of the setup.py like so ../setup.py
> from dataflow_runner.
>
> Please let me know if anything is unclear.
>
> Sayak Paul | sayak.dev
>
>
>
> On Tue, Sep 14, 2021 at 12:00 AM Valentyn Tymofieiev <[email protected]>
> wrote:
>
>> Hi,
>> Try to  set --save_main_session=True
>> when you launch the pipeline. If that works, you should be able to
>> restructure the pipeline package, so that the imports are not in the main
>> module, similar to https://stackoverflow.com/a/58845832/5153670
>>
>>
>> On Mon, Sep 13, 2021 at 2:45 AM Sayak Paul <[email protected]> wrote:
>>
>>> Hi folks,
>>>
>>> Have you ever faced an issue with local and global dependencies inside
>>> an Apache Beam Pipeline while executing it on Dataflow?
>>>
>>> My pipeline involves a few components from the other modules of the
>>> project and I have set up a setup.py following the guidelines from [1].
>>> What is surprising to me is that four out of the five modules are working
>>> as expected and for only one, Beam is complaining that it's not defined.
>>> Note that it only happens when I run it using the DataflowRunner.
>>>
>>> The error goes away when I include the module import inside the method
>>> that starts my Beam pipeline. This is a hacky workaround IMO. More so
>>> because I am not sure why the other module imports are working then.
>>>
>>> Could anyone provide some hints?
>>>
>>> [1]
>>> https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
>>>
>>> Sayak Paul | sayak.dev
>>>
>>>

Reply via email to