Probably option 2 would be the cleanest approach in your case, e.g. run git clone git://[email protected]/mycompany/mypackage python mypackage/setup.py sdist
and then specifying extra_packages=dist/mypackage.tar.gz On Mon, Jun 5, 2017 at 1:56 PM, Dmitry Demeshchuk <[email protected]> wrote: > Hi list, > > Suppose, you have a private Python package that contains some code people > want to be sharing when writing their pipelines. > > So, typically, the installation process of the package would be either > > pip install git+ssh://[email protected]/mycompany/mypackage#egg=mypackage > > or > > git clone git://[email protected]/mycompany/mypackage > python setup.py mypackage/setup.py > > Now, the problem starts when we want to get that package into Dataflow. > Right now, to my understanding, DataflowRunner supports 3 approaches: > > Specifying a requirements_file parameter in the pipeline options. This > basically must be a requirements.txt file. > > Specifying an extra_packages parameter in the pipeline options. This must be > a list of tarballs, each of which contains a Python package packaged using > distutils. > > Specifying a setup_file parameter in the pipeline options. This will just > run the python path/to/my/setup.py package command and then send the files > over the wire. > > The best approach I could come up with was including an additional setup.py > into the package itself, so that when we install that package, the setup.py > file gets installed along with it. And then, I’d point the setup_file option > to that file. > > This gist shows the basic approach in code. Both setup.py and options.py are > supposed to be present in the installed package. > > It kind of works for me, with some caveats, but I just wanted to find out if > it’s a more decent way to handle my situation. I’m not keen on specifying > that private package as a git dependency, because of having to worry about > git credentials, but maybe there are other ways? > > Thanks! > > -- > Best regards, > Dmitry Demeshchuk.
