Marco

To add upon the others answers, there are 2 ways I add dependencies on my
jobs. In both cases, you need a setup.py like this:
from setuptools import setup, find_packages

setup(
name="dependencies",
version="0.0.1",
packages=find_packages(),
install_requires=[
'pymssql==2.1.4', 'google-cloud-storage==1.22.0'],
)

With only this on your setup file, you will be able to add dependencies.

1) add a setup file:
when you run you job, you have to add a --setup_file. So, it would be like
this:

python -m main_file.py --runner=dataflow --project=myproject
--template_location=gs://mybucket/my_template
--temp_location=gs://mybucket/temp
--staging_location=gs://mybucket/staging --setup_file home/path/to/ setup.py

2) extra package:
>From your setup, you can create a package you add to your job. To do so,
you need to run:
python setup.py sdist
The file created from it you add to your job with the parameter
--extra_package

python -m main_file.py --runner=dataflow --project=myproject
--template_location=gs://mybucket/my_template
--temp_location=gs://mybucket/temp
--staging_location=gs://mybucket/staging --extra_package
dist/dependencies-0.0.1.tar.gz

Good luck!

André Rocha
Data Engineer

On Fri, Jan 17, 2020 at 8:35 AM Chris Swart <[email protected]> wrote:

> Hey Marco, you will need to package your application in a module the
> Juliaset example shows you how you could go about it
> https://github.com/apache/beam/tree/master/sdks/python/apache_beam/examples/complete/juliaset
>  Best
> wishes, Chris
>
> On Thu, Jan 16, 2020 at 10:00 PM Marco Mistroni <[email protected]>
> wrote:
>
>> Hello all
>>  i have written an apache beam workflow which i have splitted across two
>> file
>> - main_file.py  contains the pipeline
>> - utils.py (which contains few functions used in the pipeline)
>>
>> I have created template  for this using the command below
>>
>> python -m main_file.py --runner=dataflow --project=myproject
>> --template_location=gs://mybucket/my_template
>> --temp_location=gs://mybucket/temp --staging_location=gs://mybucket/staging
>>
>> and i have attempted to create a job using this template.
>> However, when i kick off the job i am getting exceptions such as
>>
>>
>> Traceback (most recent call last): File
>> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py",
>> line 261, in loads return dill.loads(s) File
>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 317, in loads
>> return load(file, ignore) File
>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 305, in load
>> obj = pik.load() File
>> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 474, in
>> find_class return StockUnpickler.find_class(self, module, name)
>> ImportError: No module named 'utils'
>> I am guessing i am missign some steps in packaging the application, or
>> perhaps some extra options to specify dependencies?
>> i would not imagine writing a whole workflow in one file, so this looks
>> like a standard usecase ?
>>
>> kind regards
>>
>>
>>
>>
>>

Reply via email to