Re: Proposal for Beam Python User State and Timer APIs

2018-05-23 Thread Thomas Weise
Nice proposal; it's exciting to see this about to be added to the SDK as it enables a set of more complex use cases. I also think that some of the content can later be repurposed as user documentation. Thanks, Thomas On Wed, May 23, 2018 at 11:49 AM, Charles Chen wrote: >

Re: Existing transactionality inconsistency in the Beam Java State API

2018-05-23 Thread Charles Chen
Thanks Kenn. I think there are two issues to highlight: (1) the API should allow for some sort of prefetching / batching / background I/O for state; and (2) it should be clear what the semantics are for reading (e.g. so we don't have confusing read after write behavior). The approach I'm leaning

Portable Artifact Staging

2018-05-23 Thread Ankur Goenka
Hi, Artifact Staging is still an evolving topic in Beam Portability. I have started a document to go over different approaches for artifact staging. Please review the document and provide your

Re: Existing transactionality inconsistency in the Beam Java State API

2018-05-23 Thread Robert Bradshaw
Thanks for laying this out so well, Kenn. I'm also leaning towards the second option, despite its drawbacks. (In particular, readLater should not influence what's returned at read(), it's just a hint.) On Wed, May 23, 2018 at 4:43 PM Kenneth Knowles wrote: > Great idea to bring

Re: SQL shaded jars don't work. How to test?

2018-05-23 Thread Kenneth Knowles
What's the status of moving it forward? Is it a ton of work / too much to do quickly? On Wed, May 23, 2018 at 9:11 AM Andrew Pilloud wrote: > To loop the list in on discussions going on in > https://github.com/apache/beam/pull/5443: our normal tests don't run > against the

Re: Existing transactionality inconsistency in the Beam Java State API

2018-05-23 Thread Kenneth Knowles
Great idea to bring it to dev@. I think it is better to focus here than long doc comment threads. I had strong opinions that I think were a bit confused and wrong. Sorry for that. I stated this position: - XYZState class is a handle to a mutable location - its methods like isEmpty() or

Gradle closure hack

2018-05-23 Thread Kenneth Knowles
Getting into the spirit of Groovy/Gradle style, can we make "applyJavaNature" look like these other config blocks? applyJavaNature { // maybe this should be called beamJavaModule errorprone { ... errorprone config } } If I understand correctly, this should also

Existing transactionality inconsistency in the Beam Java State API

2018-05-23 Thread Charles Chen
During the design of the Beam Python State API, we noticed some transactionality inconsistencies in the existing Beam Java State API (these are the unresolved bugs BEAM-2980 and BEAM-2975 ). We are

Re: Launching a Portable Pipeline

2018-05-23 Thread Ankur Goenka
Yes, JobService can be implemented by a runner and can be bade available using an endpoint. The component reuse is more of a code reuse. On Wed, May 23, 2018 at 3:14 PM Reuven Lax wrote: > > > On Wed, May 23, 2018 at 3:09 PM Ankur Goenka wrote: > >> 1. Why

Re: Launching a Portable Pipeline

2018-05-23 Thread Thomas Weise
+1 IMO that should be the approach in general. As much code as possible reusable across runners and default job service implementation that can be customized per runner if necessary. It will be necessary to build at least per runner artifacts due to their dependencies (like the profiles we have

Re: Launching a Portable Pipeline

2018-05-23 Thread Reuven Lax
On Wed, May 23, 2018 at 3:09 PM Ankur Goenka wrote: > 1. Why JobService is runner specific? Couldn't at least a good part of it > be reused given that the runner specific parts are mostly in the > translation? or I am missing other reasons? > > Yes, absolutely. A good chunk of

Re: Launching a Portable Pipeline

2018-05-23 Thread Ankur Goenka
1. Why JobService is runner specific? Couldn't at least a good part of it be reused given that the runner specific parts are mostly in the translation? or I am missing other reasons? Yes, absolutely. A good chunk of it can be reused. We are reusing a few components from ULR in Flink runner.

Re: The full list of proposals / prototype documents

2018-05-23 Thread Griselda Cuevas
Hi Everyone, @Alexey, I think this is a great idea, I'd like to understand more of the motivation behind having all the designs doc under a single page. In my opinion it could become a challenge to maintain a page, so knowing what you want to accomplish could help us think of alternative

Re: Java Direct Runner technical documentation is coming soon!

2018-05-23 Thread Huygaa Batsaikhan
On Wed, May 23, 2018 at 11:20 AM Huygaa Batsaikhan wrote: > Hi devs, > > Robin Qu and I, both new Beam contributors, have been working on adding > new features in Java Direct Runner. However, our experience was not that > smooth because there were no technical documents

Re: Launching a Portable Pipeline

2018-05-23 Thread Ismaël Mejía
Interesting document, two questions: 1. Why JobService is runner specific? Couldn't at least a good part of it be reused given that the runner specific parts are mostly in the translation? or I am missing other reasons? 2. What about authentication and authorisation for production runners ? Once

Re: [VOTE] Go SDK

2018-05-23 Thread Henning Rohde
Thanks Davor! I filled out the form to the best of my ability and placed it here (avoiding attachments on the list): https://web.tresorit.com/l#nUkKlgi3cBYxYAOyhCMXIw

Re: Proposed change to Portable Combine Spec - Adding a new URN

2018-05-23 Thread Lukasz Cwik
I like the new URN as it also provides a way for us to re-use the combine payload as part of combining state specs. This would allow runners to execute a gRPC Read + Combine Grouped Values + gRPC Write on the contents of a StateSpec if it grows too large. On Wed, May 23, 2018 at 1:57 PM Daniel

Re: Documentation for Beam on Windows

2018-05-23 Thread Lukasz Cwik
There is none to my knowledge. On Wed, May 23, 2018 at 1:49 PM Udi Meiri wrote: > Hi all, > > I was looking yesterday for a quickstart guide on how to use Beam on > Windows but saw that those guides are exclusively for Linux users. > > What documentation is available for

Re: Beam SQL Improvements

2018-05-23 Thread Reuven Lax
Romain, maybe it would be useful for us to find some time on slack. I'd like to understand your concerns. Also keep in mind that I'm tagging all these classes as Experimental for now, so we can definitely change these interfaces around if we decide they are not the best ones. Reuven On Tue, May

Re: The full list of proposals / prototype documents

2018-05-23 Thread Daniel Oliveira
+1 to web site page (not Google Doc). Definitely agree that a common entry point would be excellent. I don't like the idea of the Google Doc so much because it's not very good for having changes reviewed and keeping track of who added what, unlike Github. Adding an entry to the list in the

Proposed change to Portable Combine Spec - Adding a new URN

2018-05-23 Thread Daniel Oliveira
Hi everyone, This email should be relevant to anyone interested in the portable pipeline model. A few months ago I sent out an email with this doc describing my ideas for modelling portable combines that support lifting: https://s.apache.org/beam-runner-api-combine-model Recently, after some

Re: Missing copyright notices for shaded packages

2018-05-23 Thread Scott Wegner
FYI, I've opened https://issues.apache.org/jira/browse/BEAM-4393 to track this work and marked it as a 2.5.0 release blocker. On Wed, May 23, 2018 at 9:15 AM Andrew Pilloud wrote: > I generated the list of jars to check using the following search: > > grep

Documentation for Beam on Windows

2018-05-23 Thread Udi Meiri
Hi all, I was looking yesterday for a quickstart guide on how to use Beam on Windows but saw that those guides are exclusively for Linux users. What documentation is available for people wanting to use Beam on Windows machines? Thanks! smime.p7s Description: S/MIME Cryptographic Signature

Re: The full list of proposals / prototype documents

2018-05-23 Thread Lukasz Cwik
+1, Thanks for picking this up Alexey On Wed, May 23, 2018 at 10:41 AM Huygaa Batsaikhan wrote: > +1. That is great, Alexey. Robin and I are working on documenting some > missing pieces of Java SDK. We will let you know when we create polished > documents. > > On Wed, May 23,

Re: I'm back and ready to help grow our community!

2018-05-23 Thread Alan Myrvold
Congratulations on graduating!!! Glad you're back. On Tue, May 22, 2018 at 3:01 AM Matthias Baetens wrote: > Same here - shame on me. Congratulations on the graduation Gris, very > happy to have you back! > > On Tue, 22 May 2018 at 09:19 Ismaël Mejía

Re: Proposal for Beam Python User State and Timer APIs

2018-05-23 Thread Charles Chen
Thanks everyone for the detailed comments and discussions. It looks like by now, we mostly agree with the requirements and overall direction needed for the API, though there is continuing discussion on specific details. I want to highlight two new sections of the doc, which address some

Re: Proposal: keeping precommit times fast

2018-05-23 Thread Kenneth Knowles
With regard to the Job Cacher Plugin: I think it is an infra ticket to install? And I guess we need it longer term when we move to containerized builds anyhow? One thing I've experienced with the Travis-CI cache is that the time spent uploading & downloading the remote cache - in that case of all

Re: Closing (automatically?) inactive pull requests

2018-05-23 Thread Kenneth Knowles
That makes sense, to just focus on Beam's decision. It seems the tool is already built. I thought we just had to deploy it, but maybe not even that, if we can just activate it: https://github.com/apps/stale Kenn On Wed, May 23, 2018 at 9:31 AM Ismaël Mejía wrote: > Given

Java Direct Runner technical documentation is coming soon!

2018-05-23 Thread Huygaa Batsaikhan
Hi devs, Robin Qu and I, both new Beam contributors, have been working on adding new features in Java Direct Runner. However, our experience was not that smooth because there were no technical documents describing the overall design of the direct runner. As the Direct Runner is supposed to be

Re: The full list of proposals / prototype documents

2018-05-23 Thread Huygaa Batsaikhan
+1. That is great, Alexey. Robin and I are working on documenting some missing pieces of Java SDK. We will let you know when we create polished documents. On Wed, May 23, 2018 at 9:28 AM Ismaël Mejía wrote: > +1 and thanks for volunteering for this Alexey. > We really need to

Re: Closing (automatically?) inactive pull requests

2018-05-23 Thread Ismaël Mejía
Given that reaching consensus in both communities seems like a harder task than just deciding our policy. in the Beam side Why don't we just go ahead and vote around this + build the tool, and if the Flink guys are interested they can take it, no? in the future we can share that code. On Wed, May

Re: Proposal: keeping precommit times fast

2018-05-23 Thread Ismaël Mejía
I second Robert idea of ‘inteligently’ running only the affected tests, probably there is no need to run Java for a go fix (and eventually if any issue it can be catched in postcommit), same for a dev who just fixed something in KafkaIO and has to wait for other IO tests to pass. I suppose that

Re: The full list of proposals / prototype documents

2018-05-23 Thread Ismaël Mejía
+1 and thanks for volunteering for this Alexey. We really need to make this more accesible. On Wed, May 23, 2018 at 6:00 PM Alexey Romanenko wrote: > Joseph, Eugene - thank you very much for the links! > All, regarding one common entry point for all design documents.

Re: Missing copyright notices for shaded packages

2018-05-23 Thread Andrew Pilloud
I generated the list of jars to check using the following search: grep 'include(dependency(' $(find . -name 'build.gradle') Andrew On Tue, May 22, 2018 at 7:33 PM Kenneth Knowles wrote: > Did you look through all our jars or is that just a sample? > > Kenn > > On Tue, May 22,

Re: SQL shaded jars don't work. How to test?

2018-05-23 Thread Andrew Pilloud
To loop the list in on discussions going on in https://github.com/apache/beam/pull/5443: our normal tests don't run against the shaded jars. Gradle can run the tests against the shaded jars, but a bunch fail due to dependency issues. It's not just SQL. Andrew On Mon, May 21, 2018 at 11:35 AM

Re: The full list of proposals / prototype documents

2018-05-23 Thread Alexey Romanenko
Joseph, Eugene - thank you very much for the links! All, regarding one common entry point for all design documents. Could we just have a dedicated page on Beam web site with a list of links to every proposed document? Every entry (optionally) might contain, in addition, short abstract and list

Re: [VOTE] Go SDK

2018-05-23 Thread Thomas Groh
+1! I, for one, could not be more excited about our glorious portable future. On Mon, May 21, 2018 at 6:03 PM Henning Rohde wrote: > Hi everyone, > > Now that the remaining issues have been resolved as discussed, I'd like to > propose a formal vote on accepting the Go SDK

Jenkins build is back to normal : beam_SeedJob #1776

2018-05-23 Thread Apache Jenkins Server
See

Build failed in Jenkins: beam_SeedJob #1775

2018-05-23 Thread Apache Jenkins Server
See -- GitHub pull request #5452 of commit 0bdbd88022198ba7a0534f0931936ed686075885, no merge conflicts. Setting status of 0bdbd88022198ba7a0534f0931936ed686075885 to PENDING with url

Kubernetes cluster of apache-beam-testing project

2018-05-23 Thread Kamil Szewczyk
Dear Beam Devs, we are using kubernetes to on demand create/tear down resources for performance testing. It is done automatically by Jenkins jobs using PerfKit. On 20th of May, there was a blog post about security issues in kubernetes

Re: Beam SQL Improvements

2018-05-23 Thread Romain Manni-Bucau
Why not extending ProcessContext to add the new remapped output? But looks good (the part i dont like is that creating a new context each time a new feature is added is hurting users. What when beam will add some reactive support? ReactiveOutputReceiver?) Pipeline sounds the wrong storage since

Re: Beam SQL Improvements

2018-05-23 Thread Reuven Lax
Yeah, all schemas are verified when the pipeline is construct (before anything starts running). BTW - under the covers schemas are implemented as a special type of coder, and coders are always set on a PCollection. I'm happy to add explicit conversion transforms as well for Beam users, though as

Re: Beam SQL Improvements

2018-05-23 Thread Romain Manni-Bucau
Le mer. 23 mai 2018 07:55, Jean-Baptiste Onofré a écrit : > Hi, > > IMHO, it would be better to have a explicit transform/IO as converter. > > It would be easier for users. > > Another option would be to use a "TypeConverter/SchemaConverter" map as > we do in Camel: Beam could