Re: DataflowRunner | Cross-language

2020-06-08 Thread Ahmet Altay
On Mon, Jun 8, 2020 at 2:06 PM Chad Dombrova  wrote:

> Even when running portably, Dataflow still has its own implementation of
>> PubSubIO that is switched out for Python's "implementation." (It's actually
>> built into the same layer that provides the shuffle/group-by-key
>> implementation.) However, if you used the external Java PubSubIO it may not
>> recognize this and continue to use that implementation even on dataflow.
>>
>
> That's great, actually, as we still have some headaches around using the
> Java PubSubIO transform: it requires a custom build of the Java Beam API
> and SDK container to add missing dependencies and properly deal with data
> conversions from python<->java.
>
> Next question: when using Dataflow+Portability can we specify our own
> docker container for the Beam Python SDK when using the Docker executor?
>

Yes, you should be able to do that.


>
> We have two reasons to do this:
> 1) we have some environments that cannot be bootstrapped on top of the
> stock Beam SDK image
> 2) we have a somewhat modified version of the Beam SDK (changes which we
> eventually hope to contribute back, but won't be able to for at least a few
> months).
>
> If yes, what are the restrictions around custom SDK images?  e.g. must be
> the same version of Beam, must be on a registry accessible to Dataflow,
> etc...
>

- It needs to be built as described in here:
https://beam.apache.org/documentation/runtime/environments/
- Use the flag: --workerHarnessContainerImage=[location of container image]
(images need to be accessible to Dataflow VMs.)

There are no other limitations. But, this is a not yet tested/supported
path. You might run into issues.


>
> thanks
> -chad
>
>
>
>


Re: DataflowRunner | Cross-language

2020-06-08 Thread Chad Dombrova
> Even when running portably, Dataflow still has its own implementation of
> PubSubIO that is switched out for Python's "implementation." (It's actually
> built into the same layer that provides the shuffle/group-by-key
> implementation.) However, if you used the external Java PubSubIO it may not
> recognize this and continue to use that implementation even on dataflow.
>

That's great, actually, as we still have some headaches around using the
Java PubSubIO transform: it requires a custom build of the Java Beam API
and SDK container to add missing dependencies and properly deal with data
conversions from python<->java.

Next question: when using Dataflow+Portability can we specify our own
docker container for the Beam Python SDK when using the Docker executor?

We have two reasons to do this:
1) we have some environments that cannot be bootstrapped on top of the
stock Beam SDK image
2) we have a somewhat modified version of the Beam SDK (changes which we
eventually hope to contribute back, but won't be able to for at least a few
months).

If yes, what are the restrictions around custom SDK images?  e.g. must be
the same version of Beam, must be on a registry accessible to Dataflow,
etc...

thanks
-chad


Re: DataflowRunner | Cross-language

2020-06-08 Thread Robert Bradshaw
On Mon, Jun 8, 2020 at 12:57 PM Chad Dombrova  wrote:

> Hi all,
> quick followup question:
>
>
>> small correction. While the new runner will be available with Beam 2.21,
>>> the Cross-Language support will be available in 2.22.
>>> There will be limitations in the initial set of connectors you can use
>>> with Cross-Lang. But at least you will have something to test with,
>>> starting in 2.22
>>>
>>
>> To clarify, we're not actually prohibiting any other
>> cross-langauge transforms being used, but Kafka is the only one that'll be
>> extensively tested and supported at this time.
>>
>
> We're currently using the Flink runner with external Java PubSubIO
> transforms in our python pipelines because there is no pure python option.
>  In its non-portable past, Dataflow has had its own native implementation
> of PubSubIO, that got switched out at runtime, so there was no need to use
> external transforms there.  What's the story around PubSubIO when using
> Dataflow + portability?  If we were to switch from Flink to Dataflow, would
> we continue to use external Java PubSubIO transforms, or is there still
> some special treatment of pubsub for Portable Dataflow?
>

Even when running portably, Dataflow still has its own implementation of
PubSubIO that is switched out for Python's "implementation." (It's actually
built into the same layer that provides the shuffle/group-by-key
implementation.) However, if you used the external Java PubSubIO it may not
recognize this and continue to use that implementation even on dataflow.


Re: DataflowRunner | Cross-language

2020-06-08 Thread Chad Dombrova
Hi all,
quick followup question:


> small correction. While the new runner will be available with Beam 2.21,
>> the Cross-Language support will be available in 2.22.
>> There will be limitations in the initial set of connectors you can use
>> with Cross-Lang. But at least you will have something to test with,
>> starting in 2.22
>>
>
> To clarify, we're not actually prohibiting any other
> cross-langauge transforms being used, but Kafka is the only one that'll be
> extensively tested and supported at this time.
>

We're currently using the Flink runner with external Java PubSubIO
transforms in our python pipelines because there is no pure python option.
 In its non-portable past, Dataflow has had its own native implementation
of PubSubIO, that got switched out at runtime, so there was no need to use
external transforms there.  What's the story around PubSubIO when using
Dataflow + portability?  If we were to switch from Flink to Dataflow, would
we continue to use external Java PubSubIO transforms, or is there still
some special treatment of pubsub for Portable Dataflow?

-chad


Re: DataflowRunner | Cross-language

2020-05-26 Thread Robert Bradshaw
On Tue, May 26, 2020 at 4:12 PM Sergei Sokolenko  wrote:

> small correction. While the new runner will be available with Beam 2.21,
> the Cross-Language support will be available in 2.22.
> There will be limitations in the initial set of connectors you can use
> with Cross-Lang. But at least you will have something to test with,
> starting in 2.22
>

To clarify, we're not actually prohibiting any other
cross-langauge transforms being used, but Kafka is the only one that'll be
extensively tested and supported at this time.


> On Tue, May 26, 2020 at 11:23 AM Sergei Sokolenko 
> wrote:
>
>> More info will be forthcoming after Beam 2.21 is out. There will be a
>> docs page describing how it all works.
>>
>> On Thu, May 21, 2020 at 11:18 PM Paweł Urbanowicz <
>> pawel.urbanow...@polidea.com> wrote:
>>
>>> Hello, community,
>>>
>>> I found information that Google is working on supporting Dataflow runner
>>> for cross-language
>>> (https://beam.apache.org/roadmap/connectors-multi-sdk/)
>>>
>>> Is there any more information about the expected release of this feature?
>>>
>>> Thanks
>>>
>>>
>>>


Re: DataflowRunner | Cross-language

2020-05-26 Thread Sergei Sokolenko
small correction. While the new runner will be available with Beam 2.21,
the Cross-Language support will be available in 2.22.
There will be limitations in the initial set of connectors you can use with
Cross-Lang. But at least you will have something to test with, starting in
2.22

On Tue, May 26, 2020 at 11:23 AM Sergei Sokolenko  wrote:

> More info will be forthcoming after Beam 2.21 is out. There will be a docs
> page describing how it all works.
>
> On Thu, May 21, 2020 at 11:18 PM Paweł Urbanowicz <
> pawel.urbanow...@polidea.com> wrote:
>
>> Hello, community,
>>
>> I found information that Google is working on supporting Dataflow runner
>> for cross-language
>> (https://beam.apache.org/roadmap/connectors-multi-sdk/)
>>
>> Is there any more information about the expected release of this feature?
>>
>> Thanks
>>
>>
>>


Re: DataflowRunner | Cross-language

2020-05-26 Thread Sergei Sokolenko
More info will be forthcoming after Beam 2.21 is out. There will be a docs
page describing how it all works.

On Thu, May 21, 2020 at 11:18 PM Paweł Urbanowicz <
pawel.urbanow...@polidea.com> wrote:

> Hello, community,
>
> I found information that Google is working on supporting Dataflow runner
> for cross-language
> (https://beam.apache.org/roadmap/connectors-multi-sdk/)
>
> Is there any more information about the expected release of this feature?
>
> Thanks
>
>
>


Re: DataflowRunner | Cross-language

2020-05-26 Thread Chamikara Jayalath
We are working on making Kafka IO available to Python streaming users on
Dataflow through cross-language transforms. There's no ETA for the
availability of the framework in general for Dataflow yet.

Thanks,
Cham

On Thu, May 21, 2020 at 11:18 PM Paweł Urbanowicz <
pawel.urbanow...@polidea.com> wrote:

> Hello, community,
>
> I found information that Google is working on supporting Dataflow runner
> for cross-language
> (https://beam.apache.org/roadmap/connectors-multi-sdk/)
>
> Is there any more information about the expected release of this feature?
>
> Thanks
>
>
>


DataflowRunner | Cross-language

2020-05-22 Thread Paweł Urbanowicz
Hello, community,

I found information that Google is working on supporting Dataflow runner
for cross-language
(https://beam.apache.org/roadmap/connectors-multi-sdk/)

Is there any more information about the expected release of this feature?

Thanks


DataflowRunner | Cross-language

2020-05-21 Thread Paweł Urbanowicz
Hello, community,

I found information that Google is working on supporting Dataflow runner for 
cross-language
(https://beam.apache.org/roadmap/connectors-multi-sdk/)

Is there any more information about the expected release of this feature?

Thanks