Re: Kafka connector for Beam Python SDK

2018-05-01 Thread Chamikara Jayalath
Thanks all for the comments. Based on the discussion so far, looks like we have to flesh out the cross-language transforms feature quite a bit before we can utilize some of the existing Java IO in other SDKs. This might involve redesigning some of the existing Java IOs to allow expressing second or

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Kenneth Knowles
The numbers on that PR are not really what end-to-end means to me - it normally means you have a fully represented productionized use case and the metric you are looking at is the actual impact on the full system (like latency from a tap on mobile to a dashboard being updated, or monthly compute co

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Reuven Lax
On Mon, Apr 30, 2018 at 9:54 AM Kenneth Knowles wrote: > I agree with Cham's motivations as far as "we need it now" and getting > Python SDF up and running and exercised on a real connector. > > But I do find the current API of BigQueryIO to be a poor example. That > particular functionality on B

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Eugene Kirpichov
I think we've discussed this before... It is true that all of our second-order APIs can be re-expressed as first-order APIs, but that would come at a very serious performance cost - e.g. significant increase in amount of data shuffled / materialized. The second-order APIs (most importantly, Dynamic

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Lukasz Cwik
I believe that most (all?) of these cases of executing a lambda could be avoided if we passed along structured records like: { table_name: row: { ... } } On Mon, Apr 30, 2018 at 10:24 AM Chamikara Jayalath wrote: > > > On Mon, Apr 30, 2018 at 9:54 AM Kenneth Knowles wrote: > >> I agree wit

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Chamikara Jayalath
On Mon, Apr 30, 2018 at 9:54 AM Kenneth Knowles wrote: > I agree with Cham's motivations as far as "we need it now" and getting > Python SDF up and running and exercised on a real connector. > > But I do find the current API of BigQueryIO to be a poor example. That > particular functionality on B

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Henning Rohde
Although I suspect/hope that sharing IO connectors across SDKs will adequately cover the lion's share of implementations (especially the long tail), I also think it's a case-by-case decision to make. Native IO might be preferable for some uses and each SDK will want IO implementations where they sh

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Kenneth Knowles
I agree with Cham's motivations as far as "we need it now" and getting Python SDF up and running and exercised on a real connector. But I do find the current API of BigQueryIO to be a poor example. That particular functionality on BigQueryIO seems extraneous and goes against our own style guide [1

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Raghu Angadi
On Mon, Apr 30, 2018 at 8:05 AM Chamikara Jayalath wrote: > Hi Aljoscha, > > I tried to cover this in the doc. Once we have full support for > cross-language IO, we can decide this on a case-by-case basis. But I don't > think we should cease defining new sources/sinks for Beam Python SDK till > w

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Reuven Lax
Another point: cross-language IOs might add a performance penalty in many cases. For an example of this look at BigQueryIO. The user can register a SerializableFunction that is evaluated on every record, and determines which destination to write the record to. Now a Python user would want to regist

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Chamikara Jayalath
Hi Aljoscha, I tried to cover this in the doc. Once we have full support for cross-language IO, we can decide this on a case-by-case basis. But I don't think we should cease defining new sources/sinks for Beam Python SDK till we get to that point. I think there are good reasons for adding Kafka su

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Aljoscha Krettek
Is this what we want to do in the long run, i.e. implement copies of connectors for different SDKs? I thought the plan was to enable using connectors written in different languages, i.e. use the Java Kafka I/O from python. This way we wouldn't duplicate bugs for three different language (Java, P

Re: Kafka connector for Beam Python SDK

2018-04-29 Thread Eugene Kirpichov
Thanks Cham, this is great! I left just a couple of comments on the doc. On Fri, Apr 27, 2018 at 10:06 PM Chamikara Jayalath wrote: > Hi All, > > I'm looking into adding a Kafka connector to Beam Python SDK. I think this > will benefits many Python SDK users and will serve as a good example for

Kafka connector for Beam Python SDK

2018-04-27 Thread Chamikara Jayalath
Hi All, I'm looking into adding a Kafka connector to Beam Python SDK. I think this will benefits many Python SDK users and will serve as a good example for recently added Splittable DoFn API (Fn API support which will allow all runners to use Python Splittable DoFn is in active development). I cr