Re: DataflowRunner | Cross-language
On Mon, Jun 8, 2020 at 2:06 PM Chad Dombrova wrote: > Even when running portably, Dataflow still has its own implementation of >> PubSubIO that is switched out for Python's "implementation." (It's actually >> built into the same layer that provides the shuffle/group-by-key >> implementation.) However, if you used the external Java PubSubIO it may not >> recognize this and continue to use that implementation even on dataflow. >> > > That's great, actually, as we still have some headaches around using the > Java PubSubIO transform: it requires a custom build of the Java Beam API > and SDK container to add missing dependencies and properly deal with data > conversions from python<->java. > > Next question: when using Dataflow+Portability can we specify our own > docker container for the Beam Python SDK when using the Docker executor? > Yes, you should be able to do that. > > We have two reasons to do this: > 1) we have some environments that cannot be bootstrapped on top of the > stock Beam SDK image > 2) we have a somewhat modified version of the Beam SDK (changes which we > eventually hope to contribute back, but won't be able to for at least a few > months). > > If yes, what are the restrictions around custom SDK images? e.g. must be > the same version of Beam, must be on a registry accessible to Dataflow, > etc... > - It needs to be built as described in here: https://beam.apache.org/documentation/runtime/environments/ - Use the flag: --workerHarnessContainerImage=[location of container image] (images need to be accessible to Dataflow VMs.) There are no other limitations. But, this is a not yet tested/supported path. You might run into issues. > > thanks > -chad > > > >
Re: DataflowRunner | Cross-language
> Even when running portably, Dataflow still has its own implementation of > PubSubIO that is switched out for Python's "implementation." (It's actually > built into the same layer that provides the shuffle/group-by-key > implementation.) However, if you used the external Java PubSubIO it may not > recognize this and continue to use that implementation even on dataflow. > That's great, actually, as we still have some headaches around using the Java PubSubIO transform: it requires a custom build of the Java Beam API and SDK container to add missing dependencies and properly deal with data conversions from python<->java. Next question: when using Dataflow+Portability can we specify our own docker container for the Beam Python SDK when using the Docker executor? We have two reasons to do this: 1) we have some environments that cannot be bootstrapped on top of the stock Beam SDK image 2) we have a somewhat modified version of the Beam SDK (changes which we eventually hope to contribute back, but won't be able to for at least a few months). If yes, what are the restrictions around custom SDK images? e.g. must be the same version of Beam, must be on a registry accessible to Dataflow, etc... thanks -chad
Re: DataflowRunner | Cross-language
On Mon, Jun 8, 2020 at 12:57 PM Chad Dombrova wrote: > Hi all, > quick followup question: > > >> small correction. While the new runner will be available with Beam 2.21, >>> the Cross-Language support will be available in 2.22. >>> There will be limitations in the initial set of connectors you can use >>> with Cross-Lang. But at least you will have something to test with, >>> starting in 2.22 >>> >> >> To clarify, we're not actually prohibiting any other >> cross-langauge transforms being used, but Kafka is the only one that'll be >> extensively tested and supported at this time. >> > > We're currently using the Flink runner with external Java PubSubIO > transforms in our python pipelines because there is no pure python option. > In its non-portable past, Dataflow has had its own native implementation > of PubSubIO, that got switched out at runtime, so there was no need to use > external transforms there. What's the story around PubSubIO when using > Dataflow + portability? If we were to switch from Flink to Dataflow, would > we continue to use external Java PubSubIO transforms, or is there still > some special treatment of pubsub for Portable Dataflow? > Even when running portably, Dataflow still has its own implementation of PubSubIO that is switched out for Python's "implementation." (It's actually built into the same layer that provides the shuffle/group-by-key implementation.) However, if you used the external Java PubSubIO it may not recognize this and continue to use that implementation even on dataflow.
Re: DataflowRunner | Cross-language
Hi all, quick followup question: > small correction. While the new runner will be available with Beam 2.21, >> the Cross-Language support will be available in 2.22. >> There will be limitations in the initial set of connectors you can use >> with Cross-Lang. But at least you will have something to test with, >> starting in 2.22 >> > > To clarify, we're not actually prohibiting any other > cross-langauge transforms being used, but Kafka is the only one that'll be > extensively tested and supported at this time. > We're currently using the Flink runner with external Java PubSubIO transforms in our python pipelines because there is no pure python option. In its non-portable past, Dataflow has had its own native implementation of PubSubIO, that got switched out at runtime, so there was no need to use external transforms there. What's the story around PubSubIO when using Dataflow + portability? If we were to switch from Flink to Dataflow, would we continue to use external Java PubSubIO transforms, or is there still some special treatment of pubsub for Portable Dataflow? -chad
Re: DataflowRunner | Cross-language
On Tue, May 26, 2020 at 4:12 PM Sergei Sokolenko wrote: > small correction. While the new runner will be available with Beam 2.21, > the Cross-Language support will be available in 2.22. > There will be limitations in the initial set of connectors you can use > with Cross-Lang. But at least you will have something to test with, > starting in 2.22 > To clarify, we're not actually prohibiting any other cross-langauge transforms being used, but Kafka is the only one that'll be extensively tested and supported at this time. > On Tue, May 26, 2020 at 11:23 AM Sergei Sokolenko > wrote: > >> More info will be forthcoming after Beam 2.21 is out. There will be a >> docs page describing how it all works. >> >> On Thu, May 21, 2020 at 11:18 PM Paweł Urbanowicz < >> pawel.urbanow...@polidea.com> wrote: >> >>> Hello, community, >>> >>> I found information that Google is working on supporting Dataflow runner >>> for cross-language >>> (https://beam.apache.org/roadmap/connectors-multi-sdk/) >>> >>> Is there any more information about the expected release of this feature? >>> >>> Thanks >>> >>> >>>
Re: DataflowRunner | Cross-language
small correction. While the new runner will be available with Beam 2.21, the Cross-Language support will be available in 2.22. There will be limitations in the initial set of connectors you can use with Cross-Lang. But at least you will have something to test with, starting in 2.22 On Tue, May 26, 2020 at 11:23 AM Sergei Sokolenko wrote: > More info will be forthcoming after Beam 2.21 is out. There will be a docs > page describing how it all works. > > On Thu, May 21, 2020 at 11:18 PM Paweł Urbanowicz < > pawel.urbanow...@polidea.com> wrote: > >> Hello, community, >> >> I found information that Google is working on supporting Dataflow runner >> for cross-language >> (https://beam.apache.org/roadmap/connectors-multi-sdk/) >> >> Is there any more information about the expected release of this feature? >> >> Thanks >> >> >>
Re: DataflowRunner | Cross-language
More info will be forthcoming after Beam 2.21 is out. There will be a docs page describing how it all works. On Thu, May 21, 2020 at 11:18 PM Paweł Urbanowicz < pawel.urbanow...@polidea.com> wrote: > Hello, community, > > I found information that Google is working on supporting Dataflow runner > for cross-language > (https://beam.apache.org/roadmap/connectors-multi-sdk/) > > Is there any more information about the expected release of this feature? > > Thanks > > >
Re: DataflowRunner | Cross-language
We are working on making Kafka IO available to Python streaming users on Dataflow through cross-language transforms. There's no ETA for the availability of the framework in general for Dataflow yet. Thanks, Cham On Thu, May 21, 2020 at 11:18 PM Paweł Urbanowicz < pawel.urbanow...@polidea.com> wrote: > Hello, community, > > I found information that Google is working on supporting Dataflow runner > for cross-language > (https://beam.apache.org/roadmap/connectors-multi-sdk/) > > Is there any more information about the expected release of this feature? > > Thanks > > >
DataflowRunner | Cross-language
Hello, community, I found information that Google is working on supporting Dataflow runner for cross-language (https://beam.apache.org/roadmap/connectors-multi-sdk/) Is there any more information about the expected release of this feature? Thanks
DataflowRunner | Cross-language
Hello, community, I found information that Google is working on supporting Dataflow runner for cross-language (https://beam.apache.org/roadmap/connectors-multi-sdk/) Is there any more information about the expected release of this feature? Thanks