Re: Practices for running Python projects on Dataflow

Robert Bradshaw Mon, 05 Jun 2017 14:18:22 -0700

Probably option 2 would be the cleanest approach in your case, e.g. run

git clone git://[email protected]/mycompany/mypackage
python mypackage/setup.py sdist


and then specifying extra_packages=dist/mypackage.tar.gz

On Mon, Jun 5, 2017 at 1:56 PM, Dmitry Demeshchuk <[email protected]> wrote:
> Hi list,
>
> Suppose, you have a private Python package that contains some code people
> want to be sharing when writing their pipelines.
>
> So, typically, the installation process of the package would be either
>
> pip install git+ssh://[email protected]/mycompany/mypackage#egg=mypackage
>
> or
>
> git clone git://[email protected]/mycompany/mypackage
> python setup.py mypackage/setup.py
>
> Now, the problem starts when we want to get that package into Dataflow.
> Right now, to my understanding, DataflowRunner supports 3 approaches:
>
> Specifying a requirements_file parameter in the pipeline options. This
> basically must be a requirements.txt file.
>
> Specifying an extra_packages parameter in the pipeline options. This must be
> a list of tarballs, each of which contains a Python package packaged using
> distutils.
>
> Specifying a setup_file parameter in the pipeline options. This will just
> run the python path/to/my/setup.py package command and then send the files
> over the wire.
>
> The best approach I could come up with was including an additional setup.py
> into the package itself, so that when we install that package, the setup.py
> file gets installed along with it. And then, I’d point the setup_file option
> to that file.
>
> This gist shows the basic approach in code. Both setup.py and options.py are
> supposed to be present in the installed package.
>
> It kind of works for me, with some caveats, but I just wanted to find out if
> it’s a more decent way to handle my situation. I’m not keen on specifying
> that private package as a git dependency, because of having to worry about
> git credentials, but maybe there are other ways?
>
> Thanks!
>
> --
> Best regards,
> Dmitry Demeshchuk.

Re: Practices for running Python projects on Dataflow

Reply via email to