Re: Portability framework: multiple environments in one pipeline

Chad Dombrova Tue, 23 Jul 2019 15:45:56 -0700

Our specific situation is pretty unique, but I think it fits a more general
pattern.  We use a number of media applications and each comes with its own
built-in python interpreter (Autodesk Maya and SideFX Houndini, for
example), and the core modules for each application can only be imported
within their respective interpreter.  We want to be able to create
pipelines where certain transforms are hosted within different application
interpreters, so that we can avoid the ugly workarounds that we have to do
now.


I can imagine a similar scenario where a user wants to use a number of
different libraries for different transforms, but the libraries’
requirements conflict with each other, or perhaps some require python3 and
others are stuck on python2.

Where can I find documentation on the expansion service?  I found a design
doc which was helpful, but it seems to hew toward the hypothetical, so I
think there have been a number of concrete steps taken since it was
written:
https://docs.google.com/document/d/1veiDF2dVH_5y56YxCcri2elCtYJgJnqw9aKPWAgddH8/mobilebasic

-chad



On Tue, Jul 23, 2019 at 1:39 PM Chamikara Jayalath <[email protected]>
wrote:

> I think we have primary focussed on the ability run transforms from
> multiple SDK in the same pipeline (cross-language) so far, but as Robert
> mentioned the framework currently in development should also be usable for
> running pipelines that use multiple environments that have the same SDK
> installed as well. I'd love to get more clarity on the exact use-case here
> (for example, details on why you couldn't run all Python transforms in a
> single environment) and to know if others have the same requirement.
>
> Thanks,
> Cham
>
>
> On Mon, Jul 22, 2019 at 12:31 AM Robert Bradshaw <[email protected]>
> wrote:
>
>> Yes, for sure. Support for this is available in some runners (like the
>> Python Universal Local Runner and Flink) and actively being added to
>> others (e.g. Dataflow). There are still some rough edges however--one
>> currently must run an expansion service to define a pipeline step in
>> an alternative environment (e.g. by registering your transforms and
>> running
>> https://github.com/apache/beam/blob/release-2.14.0/sdks/python/apache_beam/runners/portability/expansion_service_test.py
>> ).
>> We'd like to make this process a lot smoother (and feedback would be
>> appreciated).
>>
>> On Sat, Jul 20, 2019 at 7:57 PM Chad Dombrova <[email protected]> wrote:
>> >
>> > Hi all,
>> > I'm interested to know if others on the list would find value in the
>> ability to use multiple environments (e.g. docker images) within a single
>> pipeline, using some mechanism to identify the environment(s) that a
>> transform should use. It would be quite useful for us, since our transforms
>> can have conflicting python requirements, or worse, conflicting interpreter
>> requirements.  Currently to solve this we have to break the pipeline up
>> into multiple pipelines and use pubsub to communicate between them, which
>> is not ideal.
>> >
>> > -chad
>> >
>>
>

Re: Portability framework: multiple environments in one pipeline

Reply via email to