Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-04-30 Thread Jean-Baptiste Onofré
That's a good idea ! I think using Slack to ping/ask is a good way as it's async. Regards JB On 05/01/2018 06:51 AM, Reuven Lax wrote: > I think it makes sense to have someone who hadn't done the Gradle migration to > run the release. However would it make sense for someone who did work on the

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-04-30 Thread Reuven Lax
I think it makes sense to have someone who hadn't done the Gradle migration to run the release. However would it make sense for someone who did work on the migration to partner with you JB? There may be issues that are simply due to things that were not documented well. In that case the partner

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-04-30 Thread Thomas Weise
> > > I'm volunteer to cut the release: I think I know Gradle decently, and even > if I > didn't work on the gradle "migration" during the last two weeks, I think > it's > actually better: I have an "external" view on the latest changes. > > +1 Since we expect the community overall to be able to

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-04-30 Thread Jean-Baptiste Onofré
Hi Scott, Thanks for the update. The Gradle build crashed on my machine (not related to Gradle). I launched a new one. I'm volunteer to cut the release: I think I know Gradle decently, and even if I didn't work on the gradle "migration" during the last two weeks, I think it's actually better: I

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-04-30 Thread Lukasz Cwik
I also agree that the release should be done using Gradle otherwise we won't have the opportunity to migrate off of Maven for several more weeks till 2.6.0 is released. Several people have pointed out that supporting multiple build systems is a hassle which is what inspired putting more effort

Gradle Status: Migrated!

2018-04-30 Thread Scott Wegner
Many many of you have been hacking diligently on the Gradle build, and I'm happy to announce that we now have a fully-functioning Gradle build! There's been a ton of progress since our last update [1]: * Improved nightly snapshot release [2] * Improve runner quickstarts [5] [11] * Python

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Kenneth Knowles
The numbers on that PR are not really what end-to-end means to me - it normally means you have a fully represented productionized use case and the metric you are looking at is the actual impact on the full system (like latency from a tap on mobile to a dashboard being updated, or monthly compute

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Reuven Lax
On Mon, Apr 30, 2018 at 9:54 AM Kenneth Knowles wrote: > I agree with Cham's motivations as far as "we need it now" and getting > Python SDF up and running and exercised on a real connector. > > But I do find the current API of BigQueryIO to be a poor example. That > particular

Re: Splittable DoFN in Spark discussion

2018-04-30 Thread Eugene Kirpichov
I think this stuff is happening in SparkGroupAlsoByWindowViaWindowSet: https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkGroupAlsoByWindowViaWindowSet.java#L610 As far as I can tell, there is no infinite stream of pings involved.

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-04-30 Thread Romain Manni-Bucau
Le 30 avr. 2018 19:39, "Jean-Baptiste Onofré" a écrit : Hi guys, now that I'm back from vacations, I bring back 2.5.0 release on the table ;) This is also related to the current status of build (Maven/Gradle). FYI, I gonna start the Jira triage tomorrow and I launched

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Eugene Kirpichov
I think we've discussed this before... It is true that all of our second-order APIs can be re-expressed as first-order APIs, but that would come at a very serious performance cost - e.g. significant increase in amount of data shuffled / materialized. The second-order APIs (most importantly,

Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-04-30 Thread Jean-Baptiste Onofré
Hi guys, now that I'm back from vacations, I bring back 2.5.0 release on the table ;) This is also related to the current status of build (Maven/Gradle). FYI, I gonna start the Jira triage tomorrow and I launched couple of build on my machine (both Maven and Gradle) to get an update on the

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Lukasz Cwik
I believe that most (all?) of these cases of executing a lambda could be avoided if we passed along structured records like: { table_name: row: { ... } } On Mon, Apr 30, 2018 at 10:24 AM Chamikara Jayalath wrote: > > > On Mon, Apr 30, 2018 at 9:54 AM Kenneth Knowles

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Chamikara Jayalath
On Mon, Apr 30, 2018 at 9:54 AM Kenneth Knowles wrote: > I agree with Cham's motivations as far as "we need it now" and getting > Python SDF up and running and exercised on a real connector. > > But I do find the current API of BigQueryIO to be a poor example. That > particular

Re: JIRA Permissions

2018-04-30 Thread Jean-Baptiste Onofré
Done. You might need to relog. Regards JB On 04/30/2018 07:05 PM, Cody Schroeder wrote: >   I'm not sure if this is the official channel (there's one precedent > ), > but I

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Henning Rohde
Although I suspect/hope that sharing IO connectors across SDKs will adequately cover the lion's share of implementations (especially the long tail), I also think it's a case-by-case decision to make. Native IO might be preferable for some uses and each SDK will want IO implementations where they

JIRA Permissions

2018-04-30 Thread Cody Schroeder
I'm not sure if this is the official channel (there's one precedent ), but I would like to be added to the "Contributors" role in JIRA so I can assign BEAM-4172

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Kenneth Knowles
I agree with Cham's motivations as far as "we need it now" and getting Python SDF up and running and exercised on a real connector. But I do find the current API of BigQueryIO to be a poor example. That particular functionality on BigQueryIO seems extraneous and goes against our own style guide

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Raghu Angadi
On Mon, Apr 30, 2018 at 8:05 AM Chamikara Jayalath wrote: > Hi Aljoscha, > > I tried to cover this in the doc. Once we have full support for > cross-language IO, we can decide this on a case-by-case basis. But I don't > think we should cease defining new sources/sinks for

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Reuven Lax
Another point: cross-language IOs might add a performance penalty in many cases. For an example of this look at BigQueryIO. The user can register a SerializableFunction that is evaluated on every record, and determines which destination to write the record to. Now a Python user would want to

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Chamikara Jayalath
Hi Aljoscha, I tried to cover this in the doc. Once we have full support for cross-language IO, we can decide this on a case-by-case basis. But I don't think we should cease defining new sources/sinks for Beam Python SDK till we get to that point. I think there are good reasons for adding Kafka

Re: Kafka connector for Beam Python SDK

2018-04-30 Thread Aljoscha Krettek
Is this what we want to do in the long run, i.e. implement copies of connectors for different SDKs? I thought the plan was to enable using connectors written in different languages, i.e. use the Java Kafka I/O from python. This way we wouldn't duplicate bugs for three different language (Java,

Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #24

2018-04-30 Thread Apache Jenkins Server
See Changes: [chamikara] [BEAM-3973] Adds a parameter to the Cloud Spanner read connector that -- [...truncated 1.67 MB...] Build cache key for task