Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-02-01 Thread Chamikara Jayalath
Python SDK doesn't have FileIO yet so let's keep ReadAllFromFoo transforms currently available for various file types around till we have that. Thanks, Cham On Fri, Feb 1, 2019 at 7:41 AM Jean-Baptiste Onofré wrote: > Hi, > > readFiles() should be used IMHO. We should remove readAll() to avoid

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Robert Bradshaw
On Fri, Feb 1, 2019 at 5:42 PM Thomas Weise wrote: > > On Fri, Feb 1, 2019 at 6:17 AM Maximilian Michels wrote: > >> > Max thanks for your summary. I would like to add that we agree that >> > the runner specific translation via URN is a temporal solution until >> > the wrappers transforms are

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Thomas Weise
On Fri, Feb 1, 2019 at 6:12 AM Maximilian Michels wrote: > Yes, I imagine sources to implement a JsonConfigurable interface (e.g. > on their builders): > > JsonConfigurable { >// Either a json string or Map >apply(String jsonConfig); > } > > In Python we would create this transform: > >

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Thomas Weise
On Fri, Feb 1, 2019 at 6:17 AM Maximilian Michels wrote: > > Max thanks for your summary. I would like to add that we agree that > > the runner specific translation via URN is a temporal solution until > > the wrappers transforms are written, is this correct? In any case this > > alternative

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-02-01 Thread Jean-Baptiste Onofré
Hi, readFiles() should be used IMHO. We should remove readAll() to avoid confusion. Regards JB On 30/01/2019 17:25, Ismaël Mejía wrote: > Hello, > > A ‘recent’ pattern of use in Beam is to have in file based IOs a > `readAll()` implementation that basically matches a `PCollection` of > file

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-02-01 Thread Łukasz Gajowy
+1 to deprecating and not implementing readAll() in transforms where it is equivalent to matchAll() + readMatches() + readFiles(). It encourages and advertises the use of a nice, composable api reducing the amount of code to be maintained. can the Python/Go SDK align with this? + 1 to that too.

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Maximilian Michels
Max thanks for your summary. I would like to add that we agree that the runner specific translation via URN is a temporal solution until the wrappers transforms are written, is this correct? In any case this alternative standard expansion approach deserves a discussion of their own as you

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Maximilian Michels
Yes, I imagine sources to implement a JsonConfigurable interface (e.g. on their builders): JsonConfigurable { // Either a json string or Map apply(String jsonConfig); } In Python we would create this transform: URN: JsonConfiguredSource:v1 payload: { environment: environment_id, //

Re: [DISCUSS] Should File based IOs implement readAll() or just readFiles()

2019-02-01 Thread Ismaël Mejía
I want to chime in that I am also +1 to deprecating readAll, is there anyone strongly pro readAll instead of the explicit composition? And more important, can the Python/Go SDK align with this (deprecating ReadAll and implementing ReadFiles)? On Thu, Jan 31, 2019 at 12:34 AM Chamikara Jayalath

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Ismaël Mejía
Thanks for the explanation Robert it makes much more sense now. (Sorry for the confusion in the mapping I mistyped the direction SDF <-> Source). Status of SDF: - Support for Dynamic Work Rebalancing is WIP. - Bounded version translation is supported by all non-portable runners in a relatively

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Robert Bradshaw
Are you suggesting something akin to a generic urn: JsonConfiguredJavaSource payload: some json specifying which source and which parameters which would expand to actually constructing and applying that source? (FWIW, I was imagining PubSubIO already had a translation into BeamFnApi

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Maximilian Michels
Recaping here: We all agree that SDF is the way to go for future implementations of sources. It enables us to get rid of the source interfaces. However, SDF does not solve the lack of streaming sources in Python. The expansion PR (thanks btw!) solves the problem of expanding/translating

Re: Beam Python streaming pipeline on Flink Runner

2019-02-01 Thread Robert Bradshaw
On Thu, Jan 31, 2019 at 6:25 PM Maximilian Michels wrote: > Ah, I thought you meant native Flink transforms. > > Exactly! The translation code is already there. The main challenge is how > to > programmatically configure the BeamIO from Python. I suppose that is also > an > unsolved problem for