Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
Sure - we can definitely add explicit conversion transforms. The automatic transform is useful for generic transforms and frameworks (such as SQL) that want to be able to take in a PCollection and operate on it. However if users using Schema directly find it easier to have explicit transforms to

Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
On Tue, May 22, 2018 at 10:51 PM Romain Manni-Bucau wrote: > How does it work on the pipeline side? > Do you generate these "virtual" IO at build time to enable the fluent API > to work not erasing generics? > Yeah - so I've already added support for injected element

Re: Beam SQL Improvements

2018-05-22 Thread Jean-Baptiste Onofré
Hi, IMHO, it would be better to have a explicit transform/IO as converter. It would be easier for users. Another option would be to use a "TypeConverter/SchemaConverter" map as we do in Camel: Beam could check the source/destination "type" and check in the map if there's a converter available.

Re: Beam SQL Improvements

2018-05-22 Thread Romain Manni-Bucau
How does it work on the pipeline side? Do you generate these "virtual" IO at build time to enable the fluent API to work not erasing generics? ex: SQL(row)->BigQuery(native) will not compile so we need a SQL(row)->BigQuery(row) Side note unrelated to Row: if you add another registry maybe a

Re: [VOTE] Go SDK

2018-05-22 Thread Andrew Psaltis
+1 (non-binding) Fantastic to see another language being used in this space and the learnings that will come from bringing another language to the SDK. On Wed, May 23, 2018 at 12:25 PM, Willy Lulciuc wrote: > +1 (non-binding) > > Great work! > > On Tue, May 22, 2018 at

Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
No - the only modules we need to add to core are the ones we choose to add. For example, I will probably add a registration for TableRow/TableSchema (GCP BigQuery) so these can work seamlessly with schemas. However I will add that to the GCP module, so only someone depending on that module need to

Re: Beam SQL Improvements

2018-05-22 Thread Romain Manni-Bucau
Hmm, the pluggability part is close to what I wanted to do with JsonObject as a main API (to avoid to redo a "row" API and schema API) Row.as(Class) sounds good but then, does it mean we'll get beam-sdk-java-row-jsonobject like modules (I'm not against, just trying to understand here)? If so, how

Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
By the way Romain, if you have specific scenarios in mind I would love to hear them. I can try and guess what exactly you would like to get out of schemas, but it would work better if you gave me concrete scenarios that you would like to work. Reuven On Tue, May 22, 2018 at 7:45 PM Reuven Lax

Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
Yeah, what I'm working on will help with IO. Basically if you register a function with SchemaRegistry that converts back and forth between a type (say JsonObject) and a Beam Row, then it is applied by the framework behind the scenes as part of DoFn invocation. Concrete example: let's say I have an

Re: Missing copyright notices for shaded packages

2018-05-22 Thread Kenneth Knowles
Did you look through all our jars or is that just a sample? Kenn On Tue, May 22, 2018 at 7:22 PM Davor Bonaci wrote: > This analysis looks correct. Great find! > > The recommended fix would be different. I'd suggest appending this > sentence to the end of the LICENSE file: "A

Re: [VOTE] Go SDK

2018-05-22 Thread Willy Lulciuc
+1 (non-binding) Great work! On Tue, May 22, 2018 at 3:17 PM, Kenneth Knowles wrote: > The process has to be done by an officer or member. Can you help us with > this, Davor? > > On Tue, May 22, 2018 at 3:14 PM Robert Bradshaw > wrote: > >> On Tue, May

Re: [VOTE] Go SDK

2018-05-22 Thread Davor Bonaci
Always happy to help. I'm sure JB is as well, others too! Please draft/collect any relevant data -- thanks! On Tue, May 22, 2018 at 3:17 PM, Kenneth Knowles wrote: > The process has to be done by an officer or member. Can you help us with > this, Davor? > > On Tue, May 22,

Re: Missing copyright notices for shaded packages

2018-05-22 Thread Davor Bonaci
This analysis looks correct. Great find! The recommended fix would be different. I'd suggest appending this sentence to the end of the LICENSE file: "A part of several convenience binary distributions of this software is licensed as follows", followed by the full license text (including its

Jenkins build is back to normal : beam_SeedJob #1765

2018-05-22 Thread Apache Jenkins Server
See

Build failed in Jenkins: beam_SeedJob #1764

2018-05-22 Thread Apache Jenkins Server
See -- GitHub pull request #5406 of commit b92995635829066196f4ee34783a860d6d20bda7, no merge conflicts. Setting status of b92995635829066196f4ee34783a860d6d20bda7 to PENDING with url

Re: Missing copyright notices for shaded packages

2018-05-22 Thread Andrew Pilloud
Here is what I think might be missing: (1) what artifacts are impacted and where are they distributed http://central.maven.org/maven2/org/apache/beam/beam-sdks-java-core/2.4.0/beam-sdks-java-core-2.4.0.jar

Build failed in Jenkins: beam_SeedJob #1763

2018-05-22 Thread Apache Jenkins Server
See -- GitHub pull request #5406 of commit dfa421d2d52bbaac720d5d16f69954d542e58f67, no merge conflicts. Setting status of dfa421d2d52bbaac720d5d16f69954d542e58f67 to PENDING with url

Build failed in Jenkins: beam_SeedJob #1762

2018-05-22 Thread Apache Jenkins Server
See -- GitHub pull request #5406 of commit 902d5946d1445a6f9a84248bacd19bec04ba3a56, no merge conflicts. Setting status of 902d5946d1445a6f9a84248bacd19bec04ba3a56 to PENDING with url

Re: [VOTE] Go SDK

2018-05-22 Thread Kenneth Knowles
The process has to be done by an officer or member. Can you help us with this, Davor? On Tue, May 22, 2018 at 3:14 PM Robert Bradshaw wrote: > On Tue, May 22, 2018 at 2:42 PM Davor Bonaci wrote: > > >>* Robert mentioned that "SGA should have probably

Re: [VOTE] Go SDK

2018-05-22 Thread Robert Bradshaw
On Tue, May 22, 2018 at 2:42 PM Davor Bonaci wrote: >>* Robert mentioned that "SGA should have probably already been filed" in the previous thread. I got the impression that nothing further was needed. I'll follow up. > Please just follow:

Re: Missing copyright notices for shaded packages

2018-05-22 Thread Davor Bonaci
Thanks for the report! Could you please comment more as to: (1) what artifacts are impacted and where are they distributed, (2) the external dependency being distributed, (3) license and/or term not adhered to, and (4) any proposed fix? Any such information would be helpful in triaging the

Re: [VOTE] Go SDK

2018-05-22 Thread Davor Bonaci
> > * Robert mentioned that "SGA should have probably already been filed" > in the previous thread. I got the impression that nothing further was > needed. I'll follow up. > Please just follow: http://incubator.apache.org/ip-clearance/. Simple. Quick. Perhaps relevant: I saw some golang

Re: Missing copyright notices for shaded packages

2018-05-22 Thread Lukasz Cwik
Does it have to be part of the jar or is it good enough to be part of the sources jar (as 2.4.0 had it part of the beam-parent-2.4.0-source.zip )? On Tue, May 22, 2018 at 11:16 AM Andrew Pilloud

Re: [VOTE] Go SDK

2018-05-22 Thread Robert Bradshaw
+1 (enthusiastic and binding) Really excited to see another data point in the model with a third language, and thank you for fleshing this out to a full SDK. Good to go from my perspective. On Tue, May 22, 2018 at 10:19 AM Ahmet Altay wrote: > +1 (binding) > Congratulations

Re: Launching a Portable Pipeline

2018-05-22 Thread Ankur Goenka
Thank you guys for the input. Here is the summary. *Responsibility of Beam on Job ManagementBeam provide a common interface for basic job management operations called JobService.

Re: Launching a Portable Pipeline

2018-05-22 Thread Eugene Kirpichov
Thanks Ankur, I think there's consensus, so it's probably ready to share :) On Fri, May 18, 2018 at 3:00 PM Ankur Goenka wrote: > Thanks for all the input. > I have summarized the discussions at the bottom of the document ( here >

Re: The full list of proposals / prototype documents

2018-05-22 Thread Eugene Kirpichov
Making it easier to manage indeed would be good. Could someone from PMC please add the following documents of mine to it? SDF related documents: http://s.apache.org/splittable-do-fn http://s.apache.org/sdf-via-source http://s.apache.org/textio-sdf

Re: Current progress on Portable runners

2018-05-22 Thread Eugene Kirpichov
Thanks all! Yeah, I'll update the Portability page with the status of this project and other pointers this week or next (mostly out of office this week). On Fri, May 18, 2018 at 5:01 PM Thomas Weise wrote: > - Flink JobService: in review

Re: [VOTE] Go SDK

2018-05-22 Thread Huygaa Batsaikhan
+1 (non-binding). Great news! On Tue, May 22, 2018 at 11:49 AM Chamikara Jayalath wrote: > +1 (non-binding). Great to know that our third SDK will be > released/supported officially. > > On Tue, May 22, 2018 at 11:38 AM Eugene Kirpichov > wrote: > >>

Re: [VOTE] Go SDK

2018-05-22 Thread Chamikara Jayalath
+1 (non-binding). Great to know that our third SDK will be released/supported officially. On Tue, May 22, 2018 at 11:38 AM Eugene Kirpichov wrote: > +1! > > It is particularly exciting to me that the Go support is > "portability-first" and does everything in the proper

Re: [VOTE] Go SDK

2018-05-22 Thread Eugene Kirpichov
+1! It is particularly exciting to me that the Go support is "portability-first" and does everything in the proper "portability way" from the start, free of legacy non-portable runner support code. On Tue, May 22, 2018 at 11:32 AM Scott Wegner wrote: > +1 (non-binding) > >

Re: Beam SQL Improvements

2018-05-22 Thread Romain Manni-Bucau
Well, beam can implement a new mapper but it doesnt help for io. Most of modern backends will take json directly, even javax one and it must stay generic. Then since json to pojo mapping is already done a dozen of times, not sure it is worth it for now. Le mar. 22 mai 2018 20:27, Reuven Lax

Re: [VOTE] Go SDK

2018-05-22 Thread Scott Wegner
+1 (non-binding) Having a third language will really force us to design Beam constructs in a language-agnostic way, and achieve the goals of portability. Thanks to all that have helped reach this milestone. On Tue, May 22, 2018 at 10:19 AM Ahmet Altay wrote: > +1 (binding) >

Re: Beam SQL Improvements

2018-05-22 Thread Reuven Lax
We can do even better btw. Building a SchemaRegistry where automatic conversions can be registered between schema and Java data types. With this the user won't even need a DoFn to do the conversion. On Tue, May 22, 2018, 10:13 AM Romain Manni-Bucau wrote: > Hi guys, > >

Missing copyright notices for shaded packages

2018-05-22 Thread Andrew Pilloud
I was digging around in the SQL jar trying to debug some packaging issues and noticed that we aren't including the copyright notices from the packages we are shading. I also looked at our previously released jars and they are the same (so this isn't a regression). Should we be including the

Re: Java PreCommit seems broken

2018-05-22 Thread Scott Wegner
I've logged BEAM-4382 [1] to decouple maven archetype generation from the rest of the Maven build. Luke, would you mind adding any context you have about generating archetypes from Gradle? From a quick search I couldn't find a native Gradle plugin, but perhaps the logic is simple enough to roll

Re: Beam SQL Improvements

2018-05-22 Thread Kenneth Knowles
Yea, I'm sure if you took on BEAM-4381 some folks would find it useful. Kenn On Tue, May 22, 2018 at 10:13 AM Romain Manni-Bucau wrote: > Hi guys, > > Checked out what has been done on schema model and think it is acceptable > - regarding the json debate - if >

Re: The full list of proposals / prototype documents

2018-05-22 Thread Kenneth Knowles
It is owned by the Beam PMC collectively. Any PMC member can add things to it. Ideas for making it easy to manage are welcome. Probably easier to have a markdown file somewhere with a list of docs so we can issue and review PRs. Not sure the web site is the right place for it - we have a history

Re: The full list of proposals / prototype documents

2018-05-22 Thread Scott Wegner
Thanks for the links. Any details on that Google drive folder? Who maintains it? Is it possible for any contributor to add their design doc? On Mon, May 21, 2018 at 8:15 AM Joseph PENG wrote: > Alexey, > > I do not know where you can find all design docs, but I know a

Re: [VOTE] Go SDK

2018-05-22 Thread Ahmet Altay
+1 (binding) Congratulations to the team! On Tue, May 22, 2018 at 10:13 AM, Alan Myrvold wrote: > +1 (non-binding) > Nice work! > > On Tue, May 22, 2018 at 9:18 AM Pablo Estrada wrote: > >> +1 (binding) >> Very excited to see this! >> >> On Tue, May

Re: Beam SQL Improvements

2018-05-22 Thread Romain Manni-Bucau
Hi guys, Checked out what has been done on schema model and think it is acceptable - regarding the json debate - if https://issues.apache.org/jira/browse/BEAM-4381 can be fixed. High level, it is about providing a mainstream and not too impacting model OOTB and JSON seems the most valid option

Re: [VOTE] Go SDK

2018-05-22 Thread Alan Myrvold
+1 (non-binding) Nice work! On Tue, May 22, 2018 at 9:18 AM Pablo Estrada wrote: > +1 (binding) > Very excited to see this! > > On Tue, May 22, 2018 at 9:09 AM Thomas Weise wrote: > >> +1 and congrats! >> >> >> On Tue, May 22, 2018 at 8:48 AM, Rafael

Re: Proposal: keeping post-commit tests green

2018-05-22 Thread Scott Wegner
Thanks for the thoughtful proposal Mikhail. I've left some comments in the doc. I encourage others to take a look: the proposal adds some strong policies about dealing with post-commit failures (rollback policy, locking master). Currently our post-commits are frequently red, and we're missing out

Re: Proposal for Beam Python User State and Timer APIs

2018-05-22 Thread Kenneth Knowles
Nice. I know that Java users have found it helpful to have this lower-level way of writing pipelines when the high-level primitives don't quite have the tight control they are looking for. I hope it will be a big draw for Python, too. (commenting on the doc) Kenn On Mon, May 21, 2018 at 5:15 PM

Re: [VOTE] Go SDK

2018-05-22 Thread Pablo Estrada
+1 (binding) Very excited to see this! On Tue, May 22, 2018 at 9:09 AM Thomas Weise wrote: > +1 and congrats! > > > On Tue, May 22, 2018 at 8:48 AM, Rafael Fernandez > wrote: > >> +1 ! >> >> On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik

Re: [VOTE] Go SDK

2018-05-22 Thread Thomas Weise
+1 and congrats! On Tue, May 22, 2018 at 8:48 AM, Rafael Fernandez wrote: > +1 ! > > On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik wrote: > >> +1 (binding) >> >> On Tue, May 22, 2018 at 6:16 AM Robert Burke wrote: >> >>> +1

Re: [VOTE] Go SDK

2018-05-22 Thread Rafael Fernandez
+1 ! On Tue, May 22, 2018 at 7:54 AM Lukasz Cwik wrote: > +1 (binding) > > On Tue, May 22, 2018 at 6:16 AM Robert Burke wrote: > >> +1 (non-binding) >> >> I'm looking forward to helping gophers solve their big data problems in >> their language of choice,

Re: [VOTE] Go SDK

2018-05-22 Thread Lukasz Cwik
+1 (binding) On Tue, May 22, 2018 at 6:16 AM Robert Burke wrote: > +1 (non-binding) > > I'm looking forward to helping gophers solve their big data problems in > their language of choice, and runner of choice! > > Next stop, a non-java portability runner? > > On Tue, May 22,

Re: [VOTE] Go SDK

2018-05-22 Thread Robert Burke
+1 (non-binding) I'm looking forward to helping gophers solve their big data problems in their language of choice, and runner of choice! Next stop, a non-java portability runner? On Tue, May 22, 2018, 6:08 AM Kenneth Knowles wrote: > +1 (binding) > > This is great. Feels like

Re: [VOTE] Go SDK

2018-05-22 Thread Kenneth Knowles
+1 (binding) This is great. Feels like a phase change in the life of Apache Beam, having three languages, with multiple portable runners on the horizon. Kenn On Tue, May 22, 2018 at 2:50 AM Ismaël Mejía wrote: > +1 (binding) > > Go SDK brings new language support for a

Re: I'm back and ready to help grow our community!

2018-05-22 Thread Matthias Baetens
Same here - shame on me. Congratulations on the graduation Gris, very happy to have you back! On Tue, 22 May 2018 at 09:19 Ismaël Mejía wrote: > I missed somehow this email thread. > Congratulations Gris and welcome back! > > On Fri, May 18, 2018 at 5:34 AM Jesse Anderson

Re: [VOTE] Go SDK

2018-05-22 Thread Ismaël Mejía
+1 (binding) Go SDK brings new language support for a community not well supported in the Big Data world the Go developers, so this is a great. Also the fact that this is the first SDK integrated with the portability work makes it an interesting project to learn lessons from for future languages.

Re: [VOTE] Go SDK

2018-05-22 Thread Holden Karau
+1 (non-binding), I've had a chance to work with the SDK and it's pretty neat to see Beam add support for a language before the most of the big data ecosystem. On Mon, May 21, 2018 at 10:29 PM, Jean-Baptiste Onofré wrote: > Hi Henning, > > SGA has been filed for the entire

Re: I'm back and ready to help grow our community!

2018-05-22 Thread Ismaël Mejía
I missed somehow this email thread. Congratulations Gris and welcome back! On Fri, May 18, 2018 at 5:34 AM Jesse Anderson wrote: > Congrats! > On Thu, May 17, 2018, 6:44 PM Robert Burke wrote: >> Congrats & welcome back! >> On Thu, May 17,