Re: Beam Python streaming pipeline on Flink Runner

2019-02-14 Thread Maximilian Michels
I've revised the document and included your feedback: https://s.apache.org/beam-cross-language-io I think it reads much better now. I moved away from the JSON configuration in favor of an explicit Proto-based configuration approach which leaves it up to the transform what to include in the Pro

Re: Beam Python streaming pipeline on Flink Runner

2019-02-08 Thread Maximilian Michels
Thank you for your comments. They will help to iterate over the ideas. I'd like to point out that though this seems to be specifically targeting IOs, there's nothing here that is specific to IOs. I thought about this from an IO perspective but I agree that it amounts to using any type of cros

Re: Beam Python streaming pipeline on Flink Runner

2019-02-07 Thread Maximilian Michels
I've created an initial design document: https://s.apache.org/beam-cross-language-io It does not contain all the details but perhaps it's a good basis for a discussion on how we proceed. -Max On 06.02.19 19:49, Chamikara Jayalath wrote: On Wed, Feb 6, 2019 at 8:38 AM Maximilian Michels

Re: Beam Python streaming pipeline on Flink Runner

2019-02-06 Thread Maximilian Michels
Thanks for your replies Robert and Cham. What I had in mind was a generic Wrapper that would easily allow users to use IO from Java. Such wrapper could start as an experimental feature and then, through URN versioning, become stable eventually. UDFs are needed, though they are a special case.

Re: Beam Python streaming pipeline on Flink Runner

2019-02-05 Thread Chamikara Jayalath
On Tue, Feb 5, 2019, 8:11 AM Maximilian Michels Good points Cham. > > JSON seemed like the most intuitive way to specify a configuration map. > We already use JSON in other places, e.g. to specify the environment > configuration. It is not necessarily a contradiction to have JSON inside > Protobuf

Re: Beam Python streaming pipeline on Flink Runner

2019-02-05 Thread Robert Bradshaw
On Tue, Feb 5, 2019 at 5:11 PM Maximilian Michels wrote: > > Good points Cham. > > JSON seemed like the most intuitive way to specify a configuration map. > We already use JSON in other places, e.g. to specify the environment > configuration. It is not necessarily a contradiction to have JSON insi

Re: Beam Python streaming pipeline on Flink Runner

2019-02-05 Thread Maximilian Michels
Good points Cham. JSON seemed like the most intuitive way to specify a configuration map. We already use JSON in other places, e.g. to specify the environment configuration. It is not necessarily a contradiction to have JSON inside Protobuf. From the perspective of IO authors, the user-friendl

Re: Beam Python streaming pipeline on Flink Runner

2019-02-04 Thread Chamikara Jayalath
On Fri, Feb 1, 2019 at 6:12 AM Maximilian Michels wrote: > Yes, I imagine sources to implement a JsonConfigurable interface (e.g. > on their builders): > > JsonConfigurable { >// Either a json string or Map >apply(String jsonConfig); > } > > In Python we would create this transform: > > U

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Robert Bradshaw
On Fri, Feb 1, 2019 at 5:42 PM Thomas Weise wrote: > > On Fri, Feb 1, 2019 at 6:17 AM Maximilian Michels wrote: > >> > Max thanks for your summary. I would like to add that we agree that >> > the runner specific translation via URN is a temporal solution until >> > the wrappers transforms are wr

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Thomas Weise
On Fri, Feb 1, 2019 at 6:12 AM Maximilian Michels wrote: > Yes, I imagine sources to implement a JsonConfigurable interface (e.g. > on their builders): > > JsonConfigurable { >// Either a json string or Map >apply(String jsonConfig); > } > > In Python we would create this transform: > > U

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Thomas Weise
On Fri, Feb 1, 2019 at 6:17 AM Maximilian Michels wrote: > > Max thanks for your summary. I would like to add that we agree that > > the runner specific translation via URN is a temporal solution until > > the wrappers transforms are written, is this correct? In any case this > > alternative stan

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Maximilian Michels
Max thanks for your summary. I would like to add that we agree that the runner specific translation via URN is a temporal solution until the wrappers transforms are written, is this correct? In any case this alternative standard expansion approach deserves a discussion of their own as you mention.

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Maximilian Michels
Yes, I imagine sources to implement a JsonConfigurable interface (e.g. on their builders): JsonConfigurable { // Either a json string or Map apply(String jsonConfig); } In Python we would create this transform: URN: JsonConfiguredSource:v1 payload: { environment: environment_id, // Java

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Ismaël Mejía
Thanks for the explanation Robert it makes much more sense now. (Sorry for the confusion in the mapping I mistyped the direction SDF <-> Source). Status of SDF: - Support for Dynamic Work Rebalancing is WIP. - Bounded version translation is supported by all non-portable runners in a relatively nai

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Robert Bradshaw
Are you suggesting something akin to a generic urn: JsonConfiguredJavaSource payload: some json specifying which source and which parameters which would expand to actually constructing and applying that source? (FWIW, I was imagining PubSubIO already had a translation into BeamFnApi prot

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Maximilian Michels
Recaping here: We all agree that SDF is the way to go for future implementations of sources. It enables us to get rid of the source interfaces. However, SDF does not solve the lack of streaming sources in Python. The expansion PR (thanks btw!) solves the problem of expanding/translating URNs

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Robert Bradshaw
On Thu, Jan 31, 2019 at 6:25 PM Maximilian Michels wrote: > Ah, I thought you meant native Flink transforms. > > Exactly! The translation code is already there. The main challenge is how > to > programmatically configure the BeamIO from Python. I suppose that is also > an > unsolved problem for c

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Ahmet Altay
+1 to Thomas's idea as a way to enable python users on Flink. On the other hand his will be a throwaway work once SDF is supported. How far are we from SDF support? On Thu, Jan 31, 2019 at 9:18 AM Maximilian Michels wrote: > Ah, I thought you meant native Flink transforms. > > Exactly! The trans

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Maximilian Michels
Ah, I thought you meant native Flink transforms. Exactly! The translation code is already there. The main challenge is how to programmatically configure the BeamIO from Python. I suppose that is also an unsolved problem for cross-language transforms in general. For Matthias' pipeline with Pub

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Thomas Weise
Exactly, that's what I had in mind. A Flink runner native transform would make the existing unbounded sources available, similar to: https://github.com/apache/beam/blob/2e89c1e4d35e7b5f95a622259d23d921c3d6ad1f/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingTransformTransl

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Maximilian Michels
Wouldn't it be even more useful for the transition period if we enabled Beam IO to be used via Flink (like in the legacy Flink Runner)? In this particular example, Matthias wants to use PubSubIO, which is not even available as a native Flink transform. On 31.01.19 16:21, Thomas Weise wrote: Un

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Thomas Weise
Until SDF is supported, we could also add Flink runner native transforms for selected unbounded sources [1]. That might be a reasonable option to unblock users that want to try Python streaming on Flink. Thomas [1] https://github.com/lyft/beam/blob/release-2.10.0-lyft/runners/flink/src/main/java

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Maximilian Michels
I have a hard time to imagine how can we map in a generic way RestrictionTrackers into the existing Bounded/UnboundedSource, so I would love to hear more about the details. Isn't it the other way around? The SDF is a generalization of UnboundedSource. So we would wrap UnboundedSource using SD

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Maximilian Michels
> In addition to have support in the runners, this will require a > rewrite of PubsubIO to use the new SDF API. Not necessarily. This would be one way. Another way is build an SDF wrapper for UnboundedSource. Probably the easier path for migration. On 31.01.19 14:03, Ismaël Mejía wrote: Fortu

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Maximilian Michels
Hi Matthias, This is already reflected in the compatibility matrix, if you look under SDF. There is no UnboundedSource interface for portable pipelines. That's a legacy abstraction that will be replaced with SDF. Fortunately, there is already a pending PR for cross-language pipelines which w

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Matthias Baetens
Hey Ankur, Thanks for the swift reply. Should I change this in the capability matrix then? Many thanks. Best, Matthias On Thu, 31 Jan 2019 at 09:31, Ankur Goenka wrote: > Hi Matthias, > > Unfortunately, unbounded reads including pubs

Re: Beam Python streaming pipeline on Flink Runner

2019-01-31 Thread Ankur Goenka
Hi Matthias, Unfortunately, unbounded reads including pubsub are not yet supported for portable runners. Thanks, Ankur On Thu, Jan 31, 2019 at 2:44 PM Matthias Baetens wrote: > Hi everyone, > > Last few days I have been trying to run a streaming pipeline (code on > Github