Thank you for the update, some questions inline.

On Thu, Jun 8, 2017 at 6:21 PM, Dmitry Demeshchuk <[email protected]>
wrote:

> FYI, I tried to install a psycopg2 wheel from a file using the
> "extra_packages" argument (although, wheels installation is apparently
> still an experimental feature), but this led to a problem with ECS-2 vs
> ECS-4 compatibility issues (looks like the Dataflow version of Python is
> using ECS-2, while wheels for Linux generally use ECS-4).
>

What is ECS-2 vs ECS-4 problem, and what the compatibility issue?



>
> What ended up working for me ultimately, though, is an approach similar to
> juliaset, with a few small differences: https://gist.github.com/doubleyou/
> 27bf3abb0fc77a2bc9257e6adc5cfe8f
>
> Note two things here:
>
> 1. We import the "install" class from setuptools, not from distutils.
> This, in fact, has been the core problem for me. I haven't yet tried if the
> juliaset example works for me at all, but I strongly suspect that it may
> not work exactly because of this issue.
>

Please let us know if juliaset does not work for you as is.


>
> 2. We handle commands in a simpler fashion, by just using one single class.
>
> I'll make a Jira ticket later today or tomorrow to reflect my findings,
> maybe make a pull request if I confirm that juliaset is not universally
> working either, if that's fine.
>

It would be great if you can share this information in a JIRA issue.
Juliaset is only an example of running commands at setup time, it does not
globally solve all possible issues.

Ahmet


>
> On Tue, Jun 6, 2017 at 8:46 PM, Dmitry Demeshchuk <[email protected]>
> wrote:
>
>> Yeah, I wasn't really pinning it myself, it's one of the dependency
>> packages that depends on that specific version.
>>
>> Thanks for the information, I'll try to explicitly install 33.1.1 and see
>> if it changes anything.
>>
>> On Tue, Jun 6, 2017 at 7:13 PM, Ahmet Altay <[email protected]> wrote:
>>
>>> Pinning setuptools is generally not a good practice. The reason is at
>>> installation time it might cause removal of the the setuptools that is
>>> being used to install packages.
>>>
>>> FWIW, dataflow workers should have setuptools 33.1.1, which was released
>>> in 2017/01/16.
>>>
>>> Ahmet
>>>
>>> On Tue, Jun 6, 2017 at 6:53 PM, Dmitry Demeshchuk <[email protected]>
>>> wrote:
>>>
>>>> Thanks, Ahmet, it really turned out that Stackdriver had more logs than
>>>> just the Dataflow logs section.
>>>>
>>>> So, I ended up seeing this code that fails constantly:
>>>>
>>>> I    Running setup.py install for dataflow: started
>>>> I      Running setup.py install for dataflow: finished with status 'error'
>>>> I      Complete output from command /usr/bin/python -u -c "import 
>>>> setuptools, 
>>>> tokenize;__file__='/tmp/pip-bXyST4-build/setup.py';f=getattr(tokenize, 
>>>> 'open', open)(__file__);code=f.read().replace('\r\n', 
>>>> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record 
>>>> /tmp/pip-sHw6oI-record/install-record.txt 
>>>> --single-version-externally-managed --compile:
>>>> I      usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
>>>> I         or: -c --help [cmd1 cmd2 ...]
>>>> I         or: -c --help-commands
>>>> I         or: -c cmd --help
>>>> I
>>>> I      error: option --single-version-externally-managed not recognized
>>>> I
>>>> I      ----------------------------------------
>>>> I  Command "/usr/bin/python -u -c "import setuptools, 
>>>> tokenize;__file__='/tmp/pip-bXyST4-build/setup.py';f=getattr(tokenize, 
>>>> 'open', open)(__file__);code=f.read().replace('\r\n', 
>>>> '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record 
>>>> /tmp/pip-sHw6oI-record/install-record.txt 
>>>> --single-version-externally-managed --compile" failed with error code 1 in 
>>>> /tmp/pip-bXyST4-build/
>>>> I  /usr/local/bin/pip failed with exit status 1
>>>>
>>>>
>>>> This seems to mean that the natively installed setuptools are too old,
>>>> and the new command has been generated with a newer version of setuptools
>>>> (specifically, my project has setuptools==36.0.1 as a dependency of some
>>>> package). I'm still digging more through the Stackdriver logs but so far
>>>> couldn't find out the exact reason of the failure.
>>>>
>>>> Also talking to the Dataflow folks, maybe they'll have a better idea.
>>>> I'll also try to compare this to the output of successful pipelines and see
>>>> if it gives me any ideas.
>>>>
>>>> Thank you.
>>>>
>>>> On Tue, Jun 6, 2017 at 4:40 PM, Ahmet Altay <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 6, 2017 at 2:07 PM, Dmitry Demeshchuk <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Ahmet,
>>>>>>
>>>>>> Thanks a lot for pointing out that doc, I somehow missed it from the
>>>>>> official Python SDK page!
>>>>>>
>>>>>> One thing that comes to my mind is that generally one should probably
>>>>>> use the 'install' command in setuptools, not 'build', like it's done in
>>>>>> https://github.com/apache/beam/blob/master/sdks/python/ap
>>>>>> ache_beam/examples/complete/juliaset/setup.py#L113. Reason being,
>>>>>> the 'build' step seems to be executed on the original machine, not inside
>>>>>> the runner's containers, while 'install' will be triggered inside of 
>>>>>> them.
>>>>>> If I run a pipeline that uses setup.py with a "build" step, it fails due 
>>>>>> to
>>>>>> being unable to "apt-get install libpq-dev" on a mac.
>>>>>>
>>>>>
>>>>> Thank you. This example should similarly work in install commands I
>>>>> believe. Also, if possible please file a JIRA issue with your ideas and we
>>>>> can work on improving things.
>>>>>
>>>>>
>>>>>>
>>>>>> I'm still trying to make it work with either build or install steps,
>>>>>> talking to the Dataflow folks in parallel to get more understanding of 
>>>>>> what
>>>>>> I'm doing wrong (Dataflow doesn't send out installation failure logs to
>>>>>> Stackdriver, only runtime logs, so it seems).
>>>>>>
>>>>>
>>>>> Have you tried looking worker-startup logs? All of the logs should be
>>>>> in stackdriver.
>>>>>
>>>>>
>>>>>>
>>>>>> On Tue, Jun 6, 2017 at 9:21 AM, Ahmet Altay <[email protected]> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Please see Managing Python Pipeline Dependencies [1] for various
>>>>>>> ways on installing additional dependencies. The section on non-python
>>>>>>> dependencies is relevant to your question.
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Ahmet
>>>>>>>
>>>>>>> [1] https://beam.apache.org/documentation/sdks/python-pipeli
>>>>>>> ne-dependencies/
>>>>>>>
>>>>>>> On Mon, Jun 5, 2017 at 11:52 PM, Morand, Sebastien <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Interested too. Could be fine for instance to add sftp
>>>>>>>> BoundedSource, but compilalation of paramiko with ssl library (and so
>>>>>>>> installation of ssl-dev)
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> *Sébastien MORAND*
>>>>>>>> Team Lead Solution Architect
>>>>>>>> Technology & Operations / Digital Factory
>>>>>>>> Veolia - Group Information Systems & Technology (IS&T)
>>>>>>>> Cell.: +33 7 52 66 20 81 / Direct: +33 1 85 57 71 08
>>>>>>>> <+33%201%2085%2057%2071%2008>
>>>>>>>> Bureau 0144C (Ouest)
>>>>>>>> 30, rue Madeleine-Vionnet - 93300 Aubervilliers, France
>>>>>>>> *www.veolia.com <http://www.veolia.com>*
>>>>>>>> <http://www.veolia.com>
>>>>>>>> <https://www.facebook.com/veoliaenvironment/>
>>>>>>>> <https://www.youtube.com/user/veoliaenvironnement>
>>>>>>>> <https://www.linkedin.com/company/veolia-environnement>
>>>>>>>> <https://twitter.com/veolia>
>>>>>>>>
>>>>>>>> On 6 June 2017 at 08:01, Dmitry Demeshchuk <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi again, folks,
>>>>>>>>>
>>>>>>>>> How should I go about installing Python packages that require to
>>>>>>>>> be built and/or require native dependencies like shared libraries or 
>>>>>>>>> such?
>>>>>>>>>
>>>>>>>>> I guess, I could potentially build the C-based modules using the
>>>>>>>>> same version of kernel and glibc that Dataflow is running, but 
>>>>>>>>> doesn't seem
>>>>>>>>> like there's any way to install shared libraries at these boxes, 
>>>>>>>>> right?
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best regards,
>>>>>>>>> Dmitry Demeshchuk.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>> --------------------------------
>>>>>>>> This e-mail transmission (message and any attached files) may
>>>>>>>> contain information that is proprietary, privileged and/or 
>>>>>>>> confidential to
>>>>>>>> Veolia Environnement and/or its affiliates and is intended exclusively 
>>>>>>>> for
>>>>>>>> the person(s) to whom it is addressed. If you are not the intended
>>>>>>>> recipient, please notify the sender by return e-mail and delete all 
>>>>>>>> copies
>>>>>>>> of this e-mail, including all attachments. Unless expressly 
>>>>>>>> authorized, any
>>>>>>>> use, disclosure, publication, retransmission or dissemination of this
>>>>>>>> e-mail and/or of its attachments is strictly prohibited.
>>>>>>>>
>>>>>>>> Ce message electronique et ses fichiers attaches sont strictement
>>>>>>>> confidentiels et peuvent contenir des elements dont Veolia 
>>>>>>>> Environnement
>>>>>>>> et/ou l'une de ses entites affiliees sont proprietaires. Ils sont donc
>>>>>>>> destines a l'usage de leurs seuls destinataires. Si vous avez recu ce
>>>>>>>> message par erreur, merci de le retourner a son emetteur et de le 
>>>>>>>> detruire
>>>>>>>> ainsi que toutes les pieces attachees. L'utilisation, la divulgation, 
>>>>>>>> la
>>>>>>>> publication, la distribution, ou la reproduction non expressement
>>>>>>>> autorisees de ce message et de ses pieces attachees sont interdites.
>>>>>>>> ------------------------------------------------------------
>>>>>>>> --------------------------------
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>> Dmitry Demeshchuk.
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Dmitry Demeshchuk.
>>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>> Dmitry Demeshchuk.
>>
>
>
>
> --
> Best regards,
> Dmitry Demeshchuk.
>

Reply via email to