Mentoring for Google Summer of Code

2018-03-07 Thread Yanael Barbier
Hello Beamers, I’m Yanael, student in computer science, for my last year of my master degree I’m focusing on big data and distributed systems. I have started to work with Beam and I’m excited to know more about the project. To drove me deeper into it I would like open the hood and contribute on

Re: Portable Flink Runner plan

2018-03-07 Thread Thomas Weise
Ben, Looks like we hit the send button at the same time. Is the plan the to derive the Flink implementation of the various execution services from those under org.apache.beam.runners.fnexecution ? Thanks On Wed, Mar 7, 2018 at 11:02 AM, Thomas Weise wrote: > What's the plan

Re: Gradle status

2018-03-07 Thread Lukasz Cwik
Largest outstanding areas are: * Documentation relevant to the contributors guide/release process/testing * Performance tests There has been good progress towards: * Release artifact validations and generation * ValidatesRunner post commits * Pre commits * Container builds On Wed, Mar 7, 2018

Re: Portable Flink Runner plan

2018-03-07 Thread Ben Sidhom
With respect to sharing code for rewriting pipelines: we've already written a few utilities for pipeline fusion and rewriting transforms to work with portable runners. Fusion functions the same way as in the ULR and is as simple as a single method call. However, two things prevent us from

Re: Portable Flink Runner plan

2018-03-07 Thread Aljoscha Krettek
Cool, so we had the same ideas. I think this indicates that we're not completely on the wrong track with this! ;-) Aljoscha > On 7. Mar 2018, at 21:14, Thomas Weise wrote: > > Ben, > > Looks like we hit the send button at the same time. Is the plan the to derive > the Flink

Re: Gradle status

2018-03-07 Thread Kenneth Knowles
I also cannot drop everything to work on Gradle build, but maybe it isn't that drastic anyhow. Now that we have ValidatesRunner and NeedsRunner tests and some progress on the release, is there any other known missing functionality in the Gradle builds? Archetypes? Docker container images? On

Re: Portable Flink Runner plan

2018-03-07 Thread Romain Manni-Bucau
Open question: did you think to a way to run the portable api on top of any runner to implement it once? Since runners have primitive it should be doable and avoid a per runner codebase, no? Other benefit: no direct portable api code in runners, yeah :). (Im thinking to a runner decorator or

Re: Portable Flink Runner plan

2018-03-07 Thread Thomas Weise
What's the plan for the endpoints that the Flink operator needs to provide (control/data plane, state, logging)? Is the intention to provide base implementations that can be shared across runners and then implement the Flink specific parts on top of it? Has work started on those? If there are

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Jean-Baptiste Onofré
I don't see the staging repo on repository.apache.org anymore. Maybe Robert already dropped it due to -1. If it's the case, he should have sent the [CANCEL] e-mail first. Regards JB On 03/07/2018 06:51 PM, Alan Myrvold wrote: > I don't see anything published > to 

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Robert Bradshaw
I'm canceling this RC. Hope to get another one out shortly (but if you notice items in the meantime, let me know). This is my first (full) release, so I'm still learning the ropes. On Wed, Mar 7, 2018 at 9:55 AM Jean-Baptiste Onofré wrote: > I don't see the staging repo on

Re: [CANCEL][VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Robert Bradshaw
Done. I'm also looking into the direct runner nexmark regression Ismael reported. Does anyone know how this code has changed since the last release? On Wed, Mar 7, 2018 at 10:09 AM Jean-Baptiste Onofré wrote: > Thanks for the update Robert. > > Can you please set the thread

Re: Gradle status

2018-03-07 Thread Romain Manni-Bucau
Le 7 mars 2018 17:34, "Lukasz Cwik" a écrit : Thanks for bringing this up Romain but I believe your data points on pass rates are only partially correct. Sure sure, it is mainly about my own PR which a very small % of the whole project ;). In the past week the Java Gradle

Re: Gradle status

2018-03-07 Thread Lukasz Cwik
Note that Alan Myrvold has been making steady progress making the release process via Gradle a reality: 1) Creating a jenkins job which can run the quickstart validation against nightly snapshots and also can be used for release candidates ( https://github.com/apache/beam/pull/4252) 2) Building a

Re: Mentoring for Google Summer of Code

2018-03-07 Thread Kenneth Knowles
Hi Yanael, Glad to hear from you! Here is a saved filter for Jira tickets describing GSoC project ideas in Beam: https://issues.apache.org/jira/issues/?filter=12343345 If you have any other ideas, feel free to file them and mention them on this mailing list. Kenn On Wed, Mar 7, 2018 at

Re: Gradle status

2018-03-07 Thread Romain Manni-Bucau
Le 7 mars 2018 20:21, "Lukasz Cwik" a écrit : Note that Alan Myrvold has been making steady progress making the release process via Gradle a reality: 1) Creating a jenkins job which can run the quickstart validation against nightly snapshots and also can be used for release

Re: Gradle status

2018-03-07 Thread Reuven Lax
I think Alan was making progress on the Gradle build. What do people think of a "fixit" day for Gradle work? (or given that people are distributed, maybe a fixit week, where everyone takes one day from the week). On Wed, Mar 7, 2018 at 1:17 PM Kenneth Knowles wrote: > I also

Re: Portable Flink Runner plan

2018-03-07 Thread Axel Magnuson
My current solution is sort of a middle ground between the two. I have made a lot of the portable API service logic generalizable, and it relies on the runner implementing a few intefaces to use it. It doesn't use decorators, but my hope is that it will prevent the need for each runner to

Re: [CANCEL][VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Robert Bradshaw
In combination with https://github.com/apache/beam/pull/4249 . These both make sense in isolation, but I think the correct fix is to fall back to bytes comparison for the context of mutation detection rather than throw an errors. On Wed, Mar 7, 2018 at 10:54 AM Robert Bradshaw

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Jean-Baptiste Onofré
Thanks for the update Robert. Can you please set the thread subject to "[CANCEL][VOTE] Release 2.4.0, release candidate #1" for the tracking ? Regards JB On 03/07/2018 07:04 PM, Robert Bradshaw wrote: > I'm canceling this RC. Hope to get another one out shortly (but if you notice > items in the

Re: [CANCEL][VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Jean-Baptiste Onofré
Let me run a git bisect to identify the change in cause. Regards JB On 03/07/2018 07:18 PM, Robert Bradshaw wrote: > Done.  > > I'm also looking into the direct runner nexmark regression Ismael reported. > Does > anyone know how this code has changed since the last release?  > > On Wed, Mar

Re: [CANCEL][VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Robert Bradshaw
Pretty sure it's https://github.com/apache/beam/commit/e3f6d6f1f0c1f9c9ca00ade17c4afedb7d3fef6b#diff-ce2373d2c2282f252c94ed360e729994 . On Wed, Mar 7, 2018 at 10:41 AM Jean-Baptiste Onofré wrote: > Let me run a git bisect to identify the change in cause. > > Regards > JB > >

Re: Gradle status

2018-03-07 Thread Kenneth Knowles
Now that the NeedsRunner tests are running via Gradle I support removing the Java maven precommit. Lower latency, simpler config, and more available Jenkins workers FTW. Now let's fix the --rerun-tasks bugs so we can do even better. On Wed, Mar 7, 2018 at 11:21 AM Lukasz Cwik

Re: Gradle status

2018-03-07 Thread Lukasz Cwik
I am working on various projects and may not be able to pause my work for a couple of weeks while the build/test process is migrated. What is everyone thinking about Romain's suggestion because If I'm the only person in such a situation, I would be willing to go along with the plan. On Wed, Mar

Re: Gradle status

2018-03-07 Thread Robert Bradshaw
+1 to a fixit day. I'd be happy to help out myself. On Wed, Mar 7, 2018 at 1:49 PM Henning Rohde wrote: > +1 to a Gradle fixit day/week. I agree with Romain that we should make an > effort quit this dual state. The Go, Go PreCommit and various container > image Gradle

Re: Gradle status

2018-03-07 Thread Henning Rohde
+1 to a Gradle fixit day/week. I agree with Romain that we should make an effort quit this dual state. The Go, Go PreCommit and various container image Gradle builds, I think, are in a reasonable state (modulo some documentation updates). On Wed, Mar 7, 2018 at 1:29 PM, Kenneth Knowles

Re: Proposal: build Python wheel distributions for Apache Beam releases

2018-03-07 Thread Robert Bradshaw
Yes, we should. There's a bit of an open question of where these release artifacts should be staged. (Eventually, of course, they'll be published to PyPi). Should they be placed alongside the source artifacts in the svn repository? On Wed, Mar 7, 2018 at 3:00 PM Ahmet Altay

Re: Gradle status

2018-03-07 Thread Kenneth Knowles
SGTM. I should also say that every day is Gradle fixit day for me, as I have been using only Gradle (with IntelliJ) for a while :-). If anyone is hesitant, definitely it is ready to be used for normal dev. Seems like changing the messaging in onboarding docs is the main thing to fixit. Based on

Re: Portable Flink Runner plan

2018-03-07 Thread Aljoscha Krettek
@Axel I assigned https://issues.apache.org/jira/browse/BEAM-2588 to you. It might make sense to also grab other issues that you're already working on. > On 7. Mar 2018, at 21:18, Aljoscha Krettek wrote: > > Cool, so we had

Re: Gradle status

2018-03-07 Thread Jason Kuster
+1 On Wed, Mar 7, 2018 at 2:21 PM Robert Bradshaw wrote: > +1 to a fixit day. I'd be happy to help out myself. > > > On Wed, Mar 7, 2018 at 1:49 PM Henning Rohde wrote: > >> +1 to a Gradle fixit day/week. I agree with Romain that we should make an >>

Re: Proposal: build Python wheel distributions for Apache Beam releases

2018-03-07 Thread Ahmet Altay
Are we planning to do this for the 2.4.0 release? I am asking, because they were not part of RC1 artifacts. On Tue, Feb 13, 2018 at 9:18 AM, Robert Bradshaw wrote: > On Tue, Feb 13, 2018 at 8:31 AM, Nima Mousavi > wrote: > > Related question: > > >

Re: [YouTube channel] Add video: Apache Beam meetup London 2: use case in finance + IO in Beam and Splittable DoFns

2018-03-07 Thread Matthias Baetens
Are we good to go? Thanks, Matthias On Feb 23, 2018 19:12, "Matthias Baetens" wrote: > I have written a first proposal here > , > combining different sources of

Re: Gradle status

2018-03-07 Thread Kenneth Knowles
I'll write up the steps as part of the fixit :-) I put IntelliJ hints into the build.gradle enough that you don't do any specific tweaking. Here's short version: 1. Create new empty project and set up JDK or create new Java project. Put it outside the source tree so you can `git clean -d -f -x`

Re: The Go SDK got accidentally merged - options to deal with the pain

2018-03-07 Thread Robert Bradshaw
I was actually thinking along the same lines: what was yet lacking to "officially" merge the Go branch in? The thread we started on this seems to have fizzled out over the holidays, but windowing support is the only must-have missing technical feature in my book (assuming documentation and testing

Re: Gradle status

2018-03-07 Thread Reuven Lax
Are there instructions for how to do this? I would like too switch my IntelliJ over to Gradle (it's still setup using Maven) On Wed, Mar 7, 2018 at 1:29 PM Kenneth Knowles wrote: > SGTM. I should also say that every day is Gradle fixit day for me, as I > have been using only

Re: Proposal: build Python wheel distributions for Apache Beam releases

2018-03-07 Thread Ahmet Altay
I do not know what is the best practice. For practical purposes it makes sense to stage to the same svn repo, so that we can test it as part of the release process. On Wed, Mar 7, 2018 at 4:22 PM, Robert Bradshaw wrote: > Yes, we should. There's a bit of an open question of

Re: The Go SDK got accidentally merged - options to deal with the pain

2018-03-07 Thread Henning Rohde
One thought: the Go SDK is actually not that far away from satisfying the guidelines for merging to master anyway (as discussed here [1]). If we decide to simply leave the code in master -- which seems to be what this thread is leaning towards -- I'll gladly sign up to do the remaining aspects (I

Re: The Go SDK got accidentally merged - options to deal with the pain

2018-03-07 Thread Kenneth Knowles
Re-reading the old thread, I see these desirata: - "enough IO to write end-to-end examples such as WordCount and demonstrate what IOs would look like" - "accounting and tracking the fact that each element has an associated window and timestamp" - "test suites and test utilities" Browsing the

[VOTE] Release 2.4.0, release candidate #2

2018-03-07 Thread Robert Bradshaw
Hi everyone, Please review and vote on the release candidate #2 for the version 2.4.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], *

Re: Gradle status

2018-03-07 Thread Romain Manni-Bucau
@Kenneth: this looks like my experience as well but is Idea - not even speaking of eclipse - that bad with gradle? Thought that it was not since android picked it up but never managed to make it scale on bigger projects due to the prebuild phases which are super slow, require a full passing build

Re: [VOTE] Release 2.4.0, release candidate #2

2018-03-07 Thread Romain Manni-Bucau
-1: a) still consider waitUntilFinish broken and a big blocker b) restrictiontracker api changed and is not backward compatible ( https://github.com/apache/beam/commit/e0034314ad196d2274cef9831ed63e090bf4d4c1#diff-098d7247eb1e9d9423bfa2ae2da38a9d ) with workarounds and fixes for these two issues

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Ahmet Altay
-1 for the same reason as Ismaël. Python version is not updated in the release branch [1]. [1] https://github.com/apache/beam/blob/release-2.4.0/sdks/python/apache_beam/version.py#L21 On Wed, Mar 7, 2018 at 8:39 AM, Jean-Baptiste Onofré wrote: > No it's not (I'm testing the

Re: Portable Flink Runner plan

2018-03-07 Thread Ben Sidhom
Yes, Axel has started work on such a shim. Our plan in the short term is to keep the old FlinkRunner around and to call into it to process jobs from the job service itself. That way we can keep the non-portable runner fully-functional while working on portability. Eventually, I think it makes

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Alan Myrvold
I don't see anything published to https://repository.apache.org/content/repositories/orgapachebeam-1028/ ? On Wed, Mar 7, 2018 at 9:26 AM Ahmet Altay wrote: > -1 for the same reason as Ismaël. Python version is not updated in the > release branch [1]. > > [1] >

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Jean-Baptiste Onofré
For the record, the vote e-mail doesn't contain actual MAVEN_VERSION and JDK_VERSION used to build. Regards JB On 03/07/2018 09:44 AM, Robert Bradshaw wrote: > Hi everyone, > > Please review and vote on the release candidate #1 for the version 2.4.0, > as follows: > [ ] +1, Approve the release

Re: Gradle status

2018-03-07 Thread Romain Manni-Bucau
Up, We discussed to have a strong switch to gradle or rollback to maven around april to not be blocked by the build tool. I noticed gradle build rarely passes on PR and kind of blurry our vision - not sure why exactly. Also, PR don't always contain the gradle updates - generally

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Romain Manni-Bucau
-1 (non binding). As mentionned in the thread about this release, until BEAM-3409 is fixed the pipeline API is not really reliable and requires workarounds to be used by any user. This is really a blocker overdue since1 or 2 releases IMHO and should has been fixed end of last year already.

[VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Robert Bradshaw
Hi everyone, Please review and vote on the release candidate #1 for the version 2.4.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], *

Re: Should tests fail due to transient errors on Dataflow Runner?

2018-03-07 Thread Łukasz Gajowy
Thank you. I did a quick check based on what you are saying and it confirmed that the streaming scenario is more tricky. Nevertheless this seems to be the problem that makes JDBC IOIT flaky, so I created a Jira for that: https://issues.apache.org/jira/browse/BEAM-3798 2018-03-06 1:52 GMT+01:00

Re: Portable Flink Runner plan

2018-03-07 Thread Aljoscha Krettek
Hi, Has anyone started on https://issues.apache.org/jira/browse/BEAM-2588 (FlinkRunner shim for serving Job API). If not I would start on that. My plan is to implement a FlinkJobService that implements JobServiceImplBase, similar to

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Ismaël Mejía
Hello, So far I found two issues: 1. The python zip file includes the name apache-beam-2.4.0.dev0 instead of apache-beam-2.4.0 not sure if this is a big issue but when I installed and tested it via pip I saw the .dev0 suffix still. 2. Direct runner has a regression on Nexmark’s query 10. I

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Robert Bradshaw
On Wed, Mar 7, 2018 at 12:50 AM Jean-Baptiste Onofré wrote: > For the record, the vote e-mail doesn't contain actual MAVEN_VERSION and > JDK_VERSION used to build. > Sorry, it's Apache Maven 3.2.5 with Java version: 1.8.0_112. Hopefully this isn't a deciding factor :). >

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Robert Bradshaw
On Wed, Mar 7, 2018 at 1:01 AM Romain Manni-Bucau wrote: > -1 (non binding). As mentionned in the thread about this release, until > BEAM-3409 is fixed the pipeline API is not really reliable and requires > workarounds to be used by any user. This is really a blocker

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Reuven Lax
Thanks for attempting to include BEAM-3409 Robert. I agree that if the fix was proving difficult to merge, it's better to not block the release on it. This is not a regression - merely annoying behavior in the direct runner. As far as I can tell, the bug should only affect the case where two tests

Re: Gradle status

2018-03-07 Thread Lukasz Cwik
Thanks for bringing this up Romain but I believe your data points on pass rates are only partially correct. In the past week the Java Gradle precommit passed 46.34% of the time compared to the Java Maven precommit which passed 46.15% of the time. When I looked at these numbers in mid January they

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Romain Manni-Bucau
Le 7 mars 2018 17:25, "Robert Bradshaw" a écrit : On Wed, Mar 7, 2018 at 1:01 AM Romain Manni-Bucau wrote: > -1 (non binding). As mentionned in the thread about this release, until > BEAM-3409 is fixed the pipeline API is not really reliable and

Re: [VOTE] Release 2.4.0, release candidate #1

2018-03-07 Thread Jean-Baptiste Onofré
No it's not (I'm testing the release right now), I just was curious and noticed the missing details ;) Thanks ! Regards JB On 03/07/2018 05:17 PM, Robert Bradshaw wrote: > On Wed, Mar 7, 2018 at 12:50 AM Jean-Baptiste Onofré > wrote: > > For