Thank you for the update, some questions inline. On Thu, Jun 8, 2017 at 6:21 PM, Dmitry Demeshchuk <[email protected]> wrote:
> FYI, I tried to install a psycopg2 wheel from a file using the > "extra_packages" argument (although, wheels installation is apparently > still an experimental feature), but this led to a problem with ECS-2 vs > ECS-4 compatibility issues (looks like the Dataflow version of Python is > using ECS-2, while wheels for Linux generally use ECS-4). > What is ECS-2 vs ECS-4 problem, and what the compatibility issue? > > What ended up working for me ultimately, though, is an approach similar to > juliaset, with a few small differences: https://gist.github.com/doubleyou/ > 27bf3abb0fc77a2bc9257e6adc5cfe8f > > Note two things here: > > 1. We import the "install" class from setuptools, not from distutils. > This, in fact, has been the core problem for me. I haven't yet tried if the > juliaset example works for me at all, but I strongly suspect that it may > not work exactly because of this issue. > Please let us know if juliaset does not work for you as is. > > 2. We handle commands in a simpler fashion, by just using one single class. > > I'll make a Jira ticket later today or tomorrow to reflect my findings, > maybe make a pull request if I confirm that juliaset is not universally > working either, if that's fine. > It would be great if you can share this information in a JIRA issue. Juliaset is only an example of running commands at setup time, it does not globally solve all possible issues. Ahmet > > On Tue, Jun 6, 2017 at 8:46 PM, Dmitry Demeshchuk <[email protected]> > wrote: > >> Yeah, I wasn't really pinning it myself, it's one of the dependency >> packages that depends on that specific version. >> >> Thanks for the information, I'll try to explicitly install 33.1.1 and see >> if it changes anything. >> >> On Tue, Jun 6, 2017 at 7:13 PM, Ahmet Altay <[email protected]> wrote: >> >>> Pinning setuptools is generally not a good practice. The reason is at >>> installation time it might cause removal of the the setuptools that is >>> being used to install packages. >>> >>> FWIW, dataflow workers should have setuptools 33.1.1, which was released >>> in 2017/01/16. >>> >>> Ahmet >>> >>> On Tue, Jun 6, 2017 at 6:53 PM, Dmitry Demeshchuk <[email protected]> >>> wrote: >>> >>>> Thanks, Ahmet, it really turned out that Stackdriver had more logs than >>>> just the Dataflow logs section. >>>> >>>> So, I ended up seeing this code that fails constantly: >>>> >>>> I Running setup.py install for dataflow: started >>>> I Running setup.py install for dataflow: finished with status 'error' >>>> I Complete output from command /usr/bin/python -u -c "import >>>> setuptools, >>>> tokenize;__file__='/tmp/pip-bXyST4-build/setup.py';f=getattr(tokenize, >>>> 'open', open)(__file__);code=f.read().replace('\r\n', >>>> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record >>>> /tmp/pip-sHw6oI-record/install-record.txt >>>> --single-version-externally-managed --compile: >>>> I usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] >>>> I or: -c --help [cmd1 cmd2 ...] >>>> I or: -c --help-commands >>>> I or: -c cmd --help >>>> I >>>> I error: option --single-version-externally-managed not recognized >>>> I >>>> I ---------------------------------------- >>>> I Command "/usr/bin/python -u -c "import setuptools, >>>> tokenize;__file__='/tmp/pip-bXyST4-build/setup.py';f=getattr(tokenize, >>>> 'open', open)(__file__);code=f.read().replace('\r\n', >>>> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record >>>> /tmp/pip-sHw6oI-record/install-record.txt >>>> --single-version-externally-managed --compile" failed with error code 1 in >>>> /tmp/pip-bXyST4-build/ >>>> I /usr/local/bin/pip failed with exit status 1 >>>> >>>> >>>> This seems to mean that the natively installed setuptools are too old, >>>> and the new command has been generated with a newer version of setuptools >>>> (specifically, my project has setuptools==36.0.1 as a dependency of some >>>> package). I'm still digging more through the Stackdriver logs but so far >>>> couldn't find out the exact reason of the failure. >>>> >>>> Also talking to the Dataflow folks, maybe they'll have a better idea. >>>> I'll also try to compare this to the output of successful pipelines and see >>>> if it gives me any ideas. >>>> >>>> Thank you. >>>> >>>> On Tue, Jun 6, 2017 at 4:40 PM, Ahmet Altay <[email protected]> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Jun 6, 2017 at 2:07 PM, Dmitry Demeshchuk < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Ahmet, >>>>>> >>>>>> Thanks a lot for pointing out that doc, I somehow missed it from the >>>>>> official Python SDK page! >>>>>> >>>>>> One thing that comes to my mind is that generally one should probably >>>>>> use the 'install' command in setuptools, not 'build', like it's done in >>>>>> https://github.com/apache/beam/blob/master/sdks/python/ap >>>>>> ache_beam/examples/complete/juliaset/setup.py#L113. Reason being, >>>>>> the 'build' step seems to be executed on the original machine, not inside >>>>>> the runner's containers, while 'install' will be triggered inside of >>>>>> them. >>>>>> If I run a pipeline that uses setup.py with a "build" step, it fails due >>>>>> to >>>>>> being unable to "apt-get install libpq-dev" on a mac. >>>>>> >>>>> >>>>> Thank you. This example should similarly work in install commands I >>>>> believe. Also, if possible please file a JIRA issue with your ideas and we >>>>> can work on improving things. >>>>> >>>>> >>>>>> >>>>>> I'm still trying to make it work with either build or install steps, >>>>>> talking to the Dataflow folks in parallel to get more understanding of >>>>>> what >>>>>> I'm doing wrong (Dataflow doesn't send out installation failure logs to >>>>>> Stackdriver, only runtime logs, so it seems). >>>>>> >>>>> >>>>> Have you tried looking worker-startup logs? All of the logs should be >>>>> in stackdriver. >>>>> >>>>> >>>>>> >>>>>> On Tue, Jun 6, 2017 at 9:21 AM, Ahmet Altay <[email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Please see Managing Python Pipeline Dependencies [1] for various >>>>>>> ways on installing additional dependencies. The section on non-python >>>>>>> dependencies is relevant to your question. >>>>>>> >>>>>>> Thank you, >>>>>>> Ahmet >>>>>>> >>>>>>> [1] https://beam.apache.org/documentation/sdks/python-pipeli >>>>>>> ne-dependencies/ >>>>>>> >>>>>>> On Mon, Jun 5, 2017 at 11:52 PM, Morand, Sebastien < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Interested too. Could be fine for instance to add sftp >>>>>>>> BoundedSource, but compilalation of paramiko with ssl library (and so >>>>>>>> installation of ssl-dev) >>>>>>>> >>>>>>>> Regards, >>>>>>>> >>>>>>>> *Sébastien MORAND* >>>>>>>> Team Lead Solution Architect >>>>>>>> Technology & Operations / Digital Factory >>>>>>>> Veolia - Group Information Systems & Technology (IS&T) >>>>>>>> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08 >>>>>>>> <+33%201%2085%2057%2071%2008> >>>>>>>> Bureau 0144C (Ouest) >>>>>>>> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France >>>>>>>> *www.veolia.com <http://www.veolia.com>* >>>>>>>> <http://www.veolia.com> >>>>>>>> <https://www.facebook.com/veoliaenvironment/> >>>>>>>> <https://www.youtube.com/user/veoliaenvironnement> >>>>>>>> <https://www.linkedin.com/company/veolia-environnement> >>>>>>>> <https://twitter.com/veolia> >>>>>>>> >>>>>>>> On 6 June 2017 at 08:01, Dmitry Demeshchuk <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi again, folks, >>>>>>>>> >>>>>>>>> How should I go about installing Python packages that require to >>>>>>>>> be built and/or require native dependencies like shared libraries or >>>>>>>>> such? >>>>>>>>> >>>>>>>>> I guess, I could potentially build the C-based modules using the >>>>>>>>> same version of kernel and glibc that Dataflow is running, but >>>>>>>>> doesn't seem >>>>>>>>> like there's any way to install shared libraries at these boxes, >>>>>>>>> right? >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best regards, >>>>>>>>> Dmitry Demeshchuk. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------ >>>>>>>> -------------------------------- >>>>>>>> This e-mail transmission (message and any attached files) may >>>>>>>> contain information that is proprietary, privileged and/or >>>>>>>> confidential to >>>>>>>> Veolia Environnement and/or its affiliates and is intended exclusively >>>>>>>> for >>>>>>>> the person(s) to whom it is addressed. If you are not the intended >>>>>>>> recipient, please notify the sender by return e-mail and delete all >>>>>>>> copies >>>>>>>> of this e-mail, including all attachments. Unless expressly >>>>>>>> authorized, any >>>>>>>> use, disclosure, publication, retransmission or dissemination of this >>>>>>>> e-mail and/or of its attachments is strictly prohibited. >>>>>>>> >>>>>>>> Ce message electronique et ses fichiers attaches sont strictement >>>>>>>> confidentiels et peuvent contenir des elements dont Veolia >>>>>>>> Environnement >>>>>>>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc >>>>>>>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce >>>>>>>> message par erreur, merci de le retourner a son emetteur et de le >>>>>>>> detruire >>>>>>>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, >>>>>>>> la >>>>>>>> publication, la distribution, ou la reproduction non expressement >>>>>>>> autorisees de ce message et de ses pieces attachees sont interdites. >>>>>>>> ------------------------------------------------------------ >>>>>>>> -------------------------------- >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best regards, >>>>>> Dmitry Demeshchuk. >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Dmitry Demeshchuk. >>>> >>> >>> >> >> >> -- >> Best regards, >> Dmitry Demeshchuk. >> > > > > -- > Best regards, > Dmitry Demeshchuk. >
