Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

2017-11-09 Thread Jean-Baptiste Onofré

Hi all,

thanks a lot for all your feedback.

The trend is about to upgrade to Spark 2.x and drop Spark 1.x support.

However, some of you (especially Reuven and Robert) commented that users have to 
be pinged as well. It makes perfect sense, and it was my intention.


I propose the following action plan:
- from the technical front, currently, I have two private branches ready: one 
with Spark 1.x & Spark 2.x support (with a common module and three artifacts), 
another one with an upgrade to Spark 2.x (dropping 1.x). I will merge the later 
on the PR.
- I will forward the vote e-mail to the user mailing list, hopefully we will 
have user feedback.


Thanks again,
Regards
JB

On 11/08/2017 08:27 AM, Jean-Baptiste Onofré wrote:

Hi all,

as you might know, we are working on Spark 2.x support in the Spark runner.

I'm working on a PR about that:

https://github.com/apache/beam/pull/3808

Today, we have something working with both Spark 1.x and 2.x from a code 
standpoint, but I have to deal with dependencies. It's the first step of the 
update as I'm still using RDD, the second step would be to support dataframe 
(but for that, I would need PCollection elements with schemas, that's another 
topic on which Eugene, Reuven and I are discussing).


However, as all major distributions now ship Spark 2.x, I don't think it's 
required anymore to support Spark 1.x.


If we agree, I will update and cleanup the PR to only support and focus on Spark 
2.x.


So, that's why I'm calling for a vote:

   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
   [ ] 0 (I don't care ;))
   [ ] -1, I would like to still support Spark 1.x, and so having support of 
both Spark 1.x and 2.x (please provide specific comment)


This vote is open for 48 hours (I have the commits ready, just waiting the end 
of the vote to push on the PR).


Thanks !
Regards
JB


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Romain Manni-Bucau
What about pushing it on a "upstream" branch and testing it for 1 week in
parallel of the maven reference build? If gradle is always 50% faster on
jenkins then it could become master setup without much discussion I guess.
We can even have 2 jenkins jobs: one with the daemon etc and one without.

Also noticed yesterday that gradle build is killing my machine (all 8 cores
are 100%) during the first minutes vs maven build which let me do something
else. Then all the consumed time which makes gradle not that fast is about
python. Will try to send figures later today.

Le 10 nov. 2017 00:10, "Lukasz Cwik"  a écrit :

> I wouldn't mind merging this change in so I could setup those Gradle
> Jenkins precommits.
>
> As per our contribution guidelines, any committer willing to sign off on
> the PR?
>
> On Thu, Nov 9, 2017 at 2:12 PM, Romain Manni-Bucau 
> wrote:
>
> > Le 9 nov. 2017 21:31, "Kenneth Knowles"  a
> écrit :
> >
> > Keep in mind that a clean build is unusual during development (it is
> common
> > for mvn use and that is a bug) and also not necessary for precommits if
> the
> > build tool is correct enough that caching is safe. So while this number
> > matters, it is not the most important.
> >
> >
> > Not sure, in dev you bypass the build tool most of the time anyway -
> thanks
> > to IDE or other shortcuts - but not on PR and CI. Keep in mind that not
> > doing a clean and killing gradle daemon makes the build not reproducible
> > and therefore useful :(. Starting to build from a subpart of the reactor
> -
> > with the mentionned mvn plugin for instance - can be nice on some CI like
> > travis if the caching is well configured but still not a guarantee the
> > build is "green".
> >
> > My trade off is to ensure an easy build and relevant result over the time
> > criteria. Do you share it as well or prefer time over other criteria -
> > which leads to other conclusions and options indeed and can make us not
> > understanding each other?
> >
> >
> > On Thu, Nov 9, 2017 at 11:30 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com
> > >
> > wrote:
> >
> > > I will try next week yes but the 2 runs i did were 28mn vs 32mn from
> > memory
> > > - after having downloaded all deps once.
> > >
> > > Le 9 nov. 2017 19:45, "Lukasz Cwik"  a
> écrit :
> > >
> > > > If Gradle was slow, do you mind running the build with --profile and
> > > > sharing that and also sharing the Maven build log?
> > > >
> > > > On Thu, Nov 9, 2017 at 10:43 AM, Lukasz Cwik 
> wrote:
> > > >
> > > > > Romain, I don't understand your last comment, were you trying to
> say
> > > that
> > > > > you had the same Gradle build times like I did and it was an
> > > improvement
> > > > > over Maven or that you did not and you experienced build times that
> > > were
> > > > > equivalent to Maven?
> > > > >
> > > > > On Thu, Nov 9, 2017 at 9:51 AM, Romain Manni-Bucau <
> > > > rmannibu...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> 2017-11-09 18:38 GMT+01:00 Kenneth Knowles  > >:
> > > > >> > On Thu, Nov 9, 2017 at 9:11 AM, Romain Manni-Bucau <
> > > > >> rmannibu...@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> >> (this is another topic so we can maybe open another thread)
> issue
> > > is
> > > > >> >> not much about python but more about the fact the build is not
> > self
> > > > >> >> contained. it is a maven build and maven should be sufficient
> > > without
> > > > >> >> having to install python + dependencies.
> > > > >> >
> > > > >> >
> > > > >> > Let's leave out the topic of whether our build should install
> > things
> > > > >> like
> > > > >> > JDKs, Python, Golang, Docker, protoc, findbugs, RAT, etc. That
> > issue
> > > > is
> > > > >> > somewhat independent of build tool, and the new build isn't
> worse
> > > than
> > > > >> the
> > > > >> > old one as far as it goes.
> > > > >>
> > > > >>
> > > > >> Yep, globally the same time with clean and killing the daemon.
> > > > >>
> > > > >> >
> > > > >> > Kenn
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >> I don't see any technical
> > > > >> >> blockers to do it (except time ;)) but it is always a bit
> > annoying
> > > to
> > > > >> >> git clone then not be able to build.
> > > > >> >>
> > > > >> >> Romain Manni-Bucau
> > > > >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > > > >> >>
> > > > >> >>
> > > > >> >> 2017-11-09 18:07 GMT+01:00 Lukasz Cwik
>  > >:
> > > > >> >> > Hmm, I have had good luck when following the Python quick
> start
> > > > setup
> > > > >> >> >  on
> > multiple
> > > > >> >> machines
> > > > >> >> > by ensuring the installed version of setuptools, virtualenv
> and
> > > pip
> > > > >> are
> > > > >> >> new
> > > > >> >> > enough versions.
> > > > >> >> >
> > > > >> >> > You can always skip the Python portion of the build 

Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

2017-11-09 Thread Robert Bradshaw
On Thu, Nov 9, 2017 at 11:05 AM, Kenneth Knowles  
wrote:
> I think it makes sense to communicate with email to users@ and in the
> release notes of 2.2.0.

Totally agree.

> That communication should be specific and indicate
> whether we are planning to merely not work on it anymore or actually remove
> it in 2.3.0.

There seems to be some ambiguity in this vote which of these two
options we're actually considering. I'm certainly +1 on relegating it
to maintenance mode at least. I don't have a good sense on the burden
of keeping it around, nor the number of potential (current?) users
we'd be alienating, which seem to be the driving factors. The fact
that all major distributions ship 2.x is very different than the
question of whether most users have migrated to 2.x.

> On Thu, Nov 9, 2017 at 6:35 AM, Amit Sela  wrote:
>
>> +1 for dropping Spark 1 support.
>> I don't think we have enough users to justify supporting both, and its been
>> a long time since this idea originally came-up (when Spark2 wasn't stable)
>> and now Spark 2 is standard in all Hadoop distros.
>> As for switching to the Dataframe API, as long as Spark 2 doesn't support
>> scanning through the state periodically (even if no data for a key),
>> watermarks won't fire keys that didn't see updates.
>>
>> On Thu, Nov 9, 2017 at 9:12 AM Thomas Weise  wrote:
>>
>> > +1 (non-binding) for dropping 1.x support
>> >
>> > I don't have the impression that there is significant adoption for Beam
>> on
>> > Spark 1.x ? A stronger Spark runner that works well on 2.x will be better
>> > for Beam adoption than a runner that has to compromise due to 1.x
>> baggage.
>> > Development efforts can go into improving the runner.
>> >
>> > Thanks,
>> > Thomas
>> >
>> >
>> > On Thu, Nov 9, 2017 at 4:08 AM, Srinivas Reddy <
>> srinivas96all...@gmail.com
>> > >
>> > wrote:
>> >
>> > > +1
>> > >
>> > >
>> > >
>> > > --
>> > > Srinivas Reddy
>> > >
>> > > http://mrsrinivas.com/
>> > >
>> > >
>> > > (Sent via gmail web)
>> > >
>> > > On 8 November 2017 at 14:27, Jean-Baptiste Onofré 
>> > wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > as you might know, we are working on Spark 2.x support in the Spark
>> > > runner.
>> > > >
>> > > > I'm working on a PR about that:
>> > > >
>> > > > https://github.com/apache/beam/pull/3808
>> > > >
>> > > > Today, we have something working with both Spark 1.x and 2.x from a
>> > code
>> > > > standpoint, but I have to deal with dependencies. It's the first step
>> > of
>> > > > the update as I'm still using RDD, the second step would be to
>> support
>> > > > dataframe (but for that, I would need PCollection elements with
>> > schemas,
>> > > > that's another topic on which Eugene, Reuven and I are discussing).
>> > > >
>> > > > However, as all major distributions now ship Spark 2.x, I don't think
>> > > it's
>> > > > required anymore to support Spark 1.x.
>> > > >
>> > > > If we agree, I will update and cleanup the PR to only support and
>> focus
>> > > on
>> > > > Spark 2.x.
>> > > >
>> > > > So, that's why I'm calling for a vote:
>> > > >
>> > > >   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
>> > > >   [ ] 0 (I don't care ;))
>> > > >   [ ] -1, I would like to still support Spark 1.x, and so having
>> > support
>> > > > of both Spark 1.x and 2.x (please provide specific comment)
>> > > >
>> > > > This vote is open for 48 hours (I have the commits ready, just
>> waiting
>> > > the
>> > > > end of the vote to push on the PR).
>> > > >
>> > > > Thanks !
>> > > > Regards
>> > > > JB
>> > > > --
>> > > > Jean-Baptiste Onofré
>> > > > jbono...@apache.org
>> > > > http://blog.nanthrax.net
>> > > > Talend - http://www.talend.com
>> > > >
>> > >
>> >
>>


Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

2017-11-09 Thread Reuven Lax
+1 from me. However let's notify users@ first. If we do get a lot of
pushback from users (which I doubt we will), we might reconsider dropping
Spark 1 support.

On Thu, Nov 9, 2017 at 11:05 AM, Kenneth Knowles 
wrote:

> +1 from me, with a friendly deprecation process
>
> I am convinced by the following:
>
>  - We don't have the resources to make both great, and anyhow it isn't
> worth it
>  - People keeping up with Beam releases are likely to be keeping up with
> Spark as well
>  - Spark 1 users already have a Spark 1 runner for Beam and can keep using
> it (and we don't actually lose the ability to update it in a pinch)
>  - Key features like portability (hence Python) will be some time so we
> should definitely not waste effort building that feature with Spark 1 in
> mind
>
> I think it makes sense to communicate with email to users@ and in the
> release notes of 2.2.0. That communication should be specific and indicate
> whether we are planning to merely not work on it anymore or actually remove
> it in 2.3.0.
>
> Kenn
>
> On Thu, Nov 9, 2017 at 6:35 AM, Amit Sela  wrote:
>
> > +1 for dropping Spark 1 support.
> > I don't think we have enough users to justify supporting both, and its
> been
> > a long time since this idea originally came-up (when Spark2 wasn't
> stable)
> > and now Spark 2 is standard in all Hadoop distros.
> > As for switching to the Dataframe API, as long as Spark 2 doesn't support
> > scanning through the state periodically (even if no data for a key),
> > watermarks won't fire keys that didn't see updates.
> >
> > On Thu, Nov 9, 2017 at 9:12 AM Thomas Weise  wrote:
> >
> > > +1 (non-binding) for dropping 1.x support
> > >
> > > I don't have the impression that there is significant adoption for Beam
> > on
> > > Spark 1.x ? A stronger Spark runner that works well on 2.x will be
> better
> > > for Beam adoption than a runner that has to compromise due to 1.x
> > baggage.
> > > Development efforts can go into improving the runner.
> > >
> > > Thanks,
> > > Thomas
> > >
> > >
> > > On Thu, Nov 9, 2017 at 4:08 AM, Srinivas Reddy <
> > srinivas96all...@gmail.com
> > > >
> > > wrote:
> > >
> > > > +1
> > > >
> > > >
> > > >
> > > > --
> > > > Srinivas Reddy
> > > >
> > > > http://mrsrinivas.com/
> > > >
> > > >
> > > > (Sent via gmail web)
> > > >
> > > > On 8 November 2017 at 14:27, Jean-Baptiste Onofré 
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > as you might know, we are working on Spark 2.x support in the Spark
> > > > runner.
> > > > >
> > > > > I'm working on a PR about that:
> > > > >
> > > > > https://github.com/apache/beam/pull/3808
> > > > >
> > > > > Today, we have something working with both Spark 1.x and 2.x from a
> > > code
> > > > > standpoint, but I have to deal with dependencies. It's the first
> step
> > > of
> > > > > the update as I'm still using RDD, the second step would be to
> > support
> > > > > dataframe (but for that, I would need PCollection elements with
> > > schemas,
> > > > > that's another topic on which Eugene, Reuven and I are discussing).
> > > > >
> > > > > However, as all major distributions now ship Spark 2.x, I don't
> think
> > > > it's
> > > > > required anymore to support Spark 1.x.
> > > > >
> > > > > If we agree, I will update and cleanup the PR to only support and
> > focus
> > > > on
> > > > > Spark 2.x.
> > > > >
> > > > > So, that's why I'm calling for a vote:
> > > > >
> > > > >   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
> > > > >   [ ] 0 (I don't care ;))
> > > > >   [ ] -1, I would like to still support Spark 1.x, and so having
> > > support
> > > > > of both Spark 1.x and 2.x (please provide specific comment)
> > > > >
> > > > > This vote is open for 48 hours (I have the commits ready, just
> > waiting
> > > > the
> > > > > end of the vote to push on the PR).
> > > > >
> > > > > Thanks !
> > > > > Regards
> > > > > JB
> > > > > --
> > > > > Jean-Baptiste Onofré
> > > > > jbono...@apache.org
> > > > > http://blog.nanthrax.net
> > > > > Talend - http://www.talend.com
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] Release 2.2.0, release candidate #3

2017-11-09 Thread Robert Bradshaw
Our release notes look like nothing more than a query for the closed
jira issues. Do we have a top-level summary to highlight the big
ticket items in the release? And in particular somewhere to mention
that this is likely the last release to support Java 7 that'll get
widely read?

On Thu, Nov 9, 2017 at 3:39 PM, Reuven Lax  wrote:
> Thanks,
>
> This RC is currently failing on a number of validation steps, so we need to
> cut at least one more RC. Fingers crossed that it will be the last one.
>
> Reuven
>
> On Thu, Nov 9, 2017 at 3:36 PM, Konstantinos Katsiapis <
> katsia...@google.com.invalid> wrote:
>
>> Just a remark: Release of Tensorflow Transform
>>  0.4.0 depends on release of
>> Apache Beam 2.2.0 so upvoting for a release (the sooner the better).
>>
>> On Thu, Nov 9, 2017 at 3:33 PM, Reuven Lax 
>> wrote:
>>
>> > Are we waiting for any more validation of this candidate? If people are
>> > still running tests I'll hold off on RC4 (to reduce the chance of an
>> RC5),
>> > otherwise I'll cut RC4 once Valentyn's PR is merged.
>> >
>> > Reuven
>> >
>> > On Thu, Nov 9, 2017 at 2:26 PM, Valentyn Tymofieiev <
>> > valen...@google.com.invalid> wrote:
>> >
>> > > https://github.com/apache/beam/pull/4109 is out to address both
>> > findings I
>> > > reported earlier.
>> > >
>> > > On Thu, Nov 9, 2017 at 8:54 AM, Etienne Chauchot 
>> > > wrote:
>> > >
>> > > > Just as a remark, I compared (on my laptop though) queries execution
>> > > times
>> > > > on my previous run of 2.2.0-RC3 with release 2.1.0 and I did not see
>> > any
>> > > > performance regression.
>> > > >
>> > > > Best
>> > > >
>> > > > Etienne
>> > > >
>> > > >
>> > > > Le 09/11/2017 à 03:13, Valentyn Tymofieiev a écrit :
>> > > >
>> > > >> I looked at Python side of Dataflow & Direct runners on Linux. There
>> > are
>> > > >> two findings:
>> > > >>
>> > > >> 1. One of the mobile gaming examples did not pass for Dataflow
>> runner,
>> > > >> addressed in: https://github.com/apache/beam/pull/4102
>> > > >> > > > >> che%2Fbeam%2Fpull%2F4102=D=1=AFQjCNF3OS6Oo-MeNET
>> > > >> CCmOxJj5Gm2uH6g>
>> > > >>
>> > > >> .
>> > > >>
>> > > >> 2. Python streaming did not work for Dataflow runner, one PR is out
>> > > >> https://github.com/apache/beam/pull/4106, but follow up PRs may be
>> > > >> required
>> > > >> as we continue to investigate. If we had a PostCommit tests suite
>> > > running
>> > > >> against a release branch, this could have been caught earlier. Filed
>> > > >> https://issues.apache.org/jira/browse/BEAM-3163.
>> > > >>
>> > > >> On Wed, Nov 8, 2017 at 2:39 PM, Reuven Lax > >
>> > > >> wrote:
>> > > >>
>> > > >> Hi everyone,
>> > > >>>
>> > > >>> Please review and vote on the release candidate #3 for the version
>> > > 2.2.0,
>> > > >>> as follows:
>> > > >>>[ ] +1, Approve the release
>> > > >>>[ ] -1, Do not approve the release (please provide specific
>> > > comments)
>> > > >>>
>> > > >>>
>> > > >>> The complete staging area is available for your review, which
>> > includes:
>> > > >>>* JIRA release notes [1],
>> > > >>>* the official Apache source release to be deployed to
>> > > >>> dist.apache.org
>> > > >>> [2],
>> > > >>> which is signed with the key with fingerprint B98B7708 [3],
>> > > >>>* all artifacts to be deployed to the Maven Central Repository
>> > [4],
>> > > >>>* source code tag "v2.2.0-RC3" [5],
>> > > >>>* website pull request listing the release and publishing the
>> API
>> > > >>> reference manual [6].
>> > > >>>* Java artifacts were built with Maven 3.5.0 and OpenJDK/Oracle
>> > JDK
>> > > >>> 1.8.0_144.
>> > > >>>* Python artifacts are deployed along with the source release to
>> > the
>> > > >>> dist.apache.org [2].
>> > > >>>
>> > > >>> The vote will be open for at least 72 hours. It is adopted by
>> > majority
>> > > >>> approval, with at least 3 PMC affirmative votes.
>> > > >>>
>> > > >>> Thanks,
>> > > >>> Reuven
>> > > >>>
>> > > >>> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?p
>> > > >>> rojectId=12319527=12341044
>> > > >>> [2] https://dist.apache.org/repos/dist/dev/beam/2.2.0/
>> > > >>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> > > >>> [4] https://repository.apache.org/content/repositories/orgapache
>> > > >>> beam-1023/
>> > > >>> [5] https://github.com/apache/beam/tree/v2.2.0-RC3
>> > > >>> 
>> > > >>> [6] https://github.com/apache/beam-site/pull/337
>> > > >>>
>> > > >>>
>> > > >
>> > >
>> >
>>
>>
>>
>> --
>> Gus Katsiapis | Software Engineer | katsia...@google.com | 650-918-7487
>>


Re: [VOTE] Release 2.2.0, release candidate #3

2017-11-09 Thread Reuven Lax
Thanks,

This RC is currently failing on a number of validation steps, so we need to
cut at least one more RC. Fingers crossed that it will be the last one.

Reuven

On Thu, Nov 9, 2017 at 3:36 PM, Konstantinos Katsiapis <
katsia...@google.com.invalid> wrote:

> Just a remark: Release of Tensorflow Transform
>  0.4.0 depends on release of
> Apache Beam 2.2.0 so upvoting for a release (the sooner the better).
>
> On Thu, Nov 9, 2017 at 3:33 PM, Reuven Lax 
> wrote:
>
> > Are we waiting for any more validation of this candidate? If people are
> > still running tests I'll hold off on RC4 (to reduce the chance of an
> RC5),
> > otherwise I'll cut RC4 once Valentyn's PR is merged.
> >
> > Reuven
> >
> > On Thu, Nov 9, 2017 at 2:26 PM, Valentyn Tymofieiev <
> > valen...@google.com.invalid> wrote:
> >
> > > https://github.com/apache/beam/pull/4109 is out to address both
> > findings I
> > > reported earlier.
> > >
> > > On Thu, Nov 9, 2017 at 8:54 AM, Etienne Chauchot 
> > > wrote:
> > >
> > > > Just as a remark, I compared (on my laptop though) queries execution
> > > times
> > > > on my previous run of 2.2.0-RC3 with release 2.1.0 and I did not see
> > any
> > > > performance regression.
> > > >
> > > > Best
> > > >
> > > > Etienne
> > > >
> > > >
> > > > Le 09/11/2017 à 03:13, Valentyn Tymofieiev a écrit :
> > > >
> > > >> I looked at Python side of Dataflow & Direct runners on Linux. There
> > are
> > > >> two findings:
> > > >>
> > > >> 1. One of the mobile gaming examples did not pass for Dataflow
> runner,
> > > >> addressed in: https://github.com/apache/beam/pull/4102
> > > >>  > > >> che%2Fbeam%2Fpull%2F4102=D=1=AFQjCNF3OS6Oo-MeNET
> > > >> CCmOxJj5Gm2uH6g>
> > > >>
> > > >> .
> > > >>
> > > >> 2. Python streaming did not work for Dataflow runner, one PR is out
> > > >> https://github.com/apache/beam/pull/4106, but follow up PRs may be
> > > >> required
> > > >> as we continue to investigate. If we had a PostCommit tests suite
> > > running
> > > >> against a release branch, this could have been caught earlier. Filed
> > > >> https://issues.apache.org/jira/browse/BEAM-3163.
> > > >>
> > > >> On Wed, Nov 8, 2017 at 2:39 PM, Reuven Lax  >
> > > >> wrote:
> > > >>
> > > >> Hi everyone,
> > > >>>
> > > >>> Please review and vote on the release candidate #3 for the version
> > > 2.2.0,
> > > >>> as follows:
> > > >>>[ ] +1, Approve the release
> > > >>>[ ] -1, Do not approve the release (please provide specific
> > > comments)
> > > >>>
> > > >>>
> > > >>> The complete staging area is available for your review, which
> > includes:
> > > >>>* JIRA release notes [1],
> > > >>>* the official Apache source release to be deployed to
> > > >>> dist.apache.org
> > > >>> [2],
> > > >>> which is signed with the key with fingerprint B98B7708 [3],
> > > >>>* all artifacts to be deployed to the Maven Central Repository
> > [4],
> > > >>>* source code tag "v2.2.0-RC3" [5],
> > > >>>* website pull request listing the release and publishing the
> API
> > > >>> reference manual [6].
> > > >>>* Java artifacts were built with Maven 3.5.0 and OpenJDK/Oracle
> > JDK
> > > >>> 1.8.0_144.
> > > >>>* Python artifacts are deployed along with the source release to
> > the
> > > >>> dist.apache.org [2].
> > > >>>
> > > >>> The vote will be open for at least 72 hours. It is adopted by
> > majority
> > > >>> approval, with at least 3 PMC affirmative votes.
> > > >>>
> > > >>> Thanks,
> > > >>> Reuven
> > > >>>
> > > >>> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?p
> > > >>> rojectId=12319527=12341044
> > > >>> [2] https://dist.apache.org/repos/dist/dev/beam/2.2.0/
> > > >>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > > >>> [4] https://repository.apache.org/content/repositories/orgapache
> > > >>> beam-1023/
> > > >>> [5] https://github.com/apache/beam/tree/v2.2.0-RC3
> > > >>> 
> > > >>> [6] https://github.com/apache/beam-site/pull/337
> > > >>>
> > > >>>
> > > >
> > >
> >
>
>
>
> --
> Gus Katsiapis | Software Engineer | katsia...@google.com | 650-918-7487
>


Re: [VOTE] Release 2.2.0, release candidate #3

2017-11-09 Thread Konstantinos Katsiapis
Just a remark: Release of Tensorflow Transform
 0.4.0 depends on release of
Apache Beam 2.2.0 so upvoting for a release (the sooner the better).

On Thu, Nov 9, 2017 at 3:33 PM, Reuven Lax  wrote:

> Are we waiting for any more validation of this candidate? If people are
> still running tests I'll hold off on RC4 (to reduce the chance of an RC5),
> otherwise I'll cut RC4 once Valentyn's PR is merged.
>
> Reuven
>
> On Thu, Nov 9, 2017 at 2:26 PM, Valentyn Tymofieiev <
> valen...@google.com.invalid> wrote:
>
> > https://github.com/apache/beam/pull/4109 is out to address both
> findings I
> > reported earlier.
> >
> > On Thu, Nov 9, 2017 at 8:54 AM, Etienne Chauchot 
> > wrote:
> >
> > > Just as a remark, I compared (on my laptop though) queries execution
> > times
> > > on my previous run of 2.2.0-RC3 with release 2.1.0 and I did not see
> any
> > > performance regression.
> > >
> > > Best
> > >
> > > Etienne
> > >
> > >
> > > Le 09/11/2017 à 03:13, Valentyn Tymofieiev a écrit :
> > >
> > >> I looked at Python side of Dataflow & Direct runners on Linux. There
> are
> > >> two findings:
> > >>
> > >> 1. One of the mobile gaming examples did not pass for Dataflow runner,
> > >> addressed in: https://github.com/apache/beam/pull/4102
> > >>  > >> che%2Fbeam%2Fpull%2F4102=D=1=AFQjCNF3OS6Oo-MeNET
> > >> CCmOxJj5Gm2uH6g>
> > >>
> > >> .
> > >>
> > >> 2. Python streaming did not work for Dataflow runner, one PR is out
> > >> https://github.com/apache/beam/pull/4106, but follow up PRs may be
> > >> required
> > >> as we continue to investigate. If we had a PostCommit tests suite
> > running
> > >> against a release branch, this could have been caught earlier. Filed
> > >> https://issues.apache.org/jira/browse/BEAM-3163.
> > >>
> > >> On Wed, Nov 8, 2017 at 2:39 PM, Reuven Lax 
> > >> wrote:
> > >>
> > >> Hi everyone,
> > >>>
> > >>> Please review and vote on the release candidate #3 for the version
> > 2.2.0,
> > >>> as follows:
> > >>>[ ] +1, Approve the release
> > >>>[ ] -1, Do not approve the release (please provide specific
> > comments)
> > >>>
> > >>>
> > >>> The complete staging area is available for your review, which
> includes:
> > >>>* JIRA release notes [1],
> > >>>* the official Apache source release to be deployed to
> > >>> dist.apache.org
> > >>> [2],
> > >>> which is signed with the key with fingerprint B98B7708 [3],
> > >>>* all artifacts to be deployed to the Maven Central Repository
> [4],
> > >>>* source code tag "v2.2.0-RC3" [5],
> > >>>* website pull request listing the release and publishing the API
> > >>> reference manual [6].
> > >>>* Java artifacts were built with Maven 3.5.0 and OpenJDK/Oracle
> JDK
> > >>> 1.8.0_144.
> > >>>* Python artifacts are deployed along with the source release to
> the
> > >>> dist.apache.org [2].
> > >>>
> > >>> The vote will be open for at least 72 hours. It is adopted by
> majority
> > >>> approval, with at least 3 PMC affirmative votes.
> > >>>
> > >>> Thanks,
> > >>> Reuven
> > >>>
> > >>> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?p
> > >>> rojectId=12319527=12341044
> > >>> [2] https://dist.apache.org/repos/dist/dev/beam/2.2.0/
> > >>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > >>> [4] https://repository.apache.org/content/repositories/orgapache
> > >>> beam-1023/
> > >>> [5] https://github.com/apache/beam/tree/v2.2.0-RC3
> > >>> 
> > >>> [6] https://github.com/apache/beam-site/pull/337
> > >>>
> > >>>
> > >
> >
>



-- 
Gus Katsiapis | Software Engineer | katsia...@google.com | 650-918-7487


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Lukasz Cwik
I wouldn't mind merging this change in so I could setup those Gradle
Jenkins precommits.

As per our contribution guidelines, any committer willing to sign off on
the PR?

On Thu, Nov 9, 2017 at 2:12 PM, Romain Manni-Bucau 
wrote:

> Le 9 nov. 2017 21:31, "Kenneth Knowles"  a écrit :
>
> Keep in mind that a clean build is unusual during development (it is common
> for mvn use and that is a bug) and also not necessary for precommits if the
> build tool is correct enough that caching is safe. So while this number
> matters, it is not the most important.
>
>
> Not sure, in dev you bypass the build tool most of the time anyway - thanks
> to IDE or other shortcuts - but not on PR and CI. Keep in mind that not
> doing a clean and killing gradle daemon makes the build not reproducible
> and therefore useful :(. Starting to build from a subpart of the reactor -
> with the mentionned mvn plugin for instance - can be nice on some CI like
> travis if the caching is well configured but still not a guarantee the
> build is "green".
>
> My trade off is to ensure an easy build and relevant result over the time
> criteria. Do you share it as well or prefer time over other criteria -
> which leads to other conclusions and options indeed and can make us not
> understanding each other?
>
>
> On Thu, Nov 9, 2017 at 11:30 AM, Romain Manni-Bucau  >
> wrote:
>
> > I will try next week yes but the 2 runs i did were 28mn vs 32mn from
> memory
> > - after having downloaded all deps once.
> >
> > Le 9 nov. 2017 19:45, "Lukasz Cwik"  a écrit :
> >
> > > If Gradle was slow, do you mind running the build with --profile and
> > > sharing that and also sharing the Maven build log?
> > >
> > > On Thu, Nov 9, 2017 at 10:43 AM, Lukasz Cwik  wrote:
> > >
> > > > Romain, I don't understand your last comment, were you trying to say
> > that
> > > > you had the same Gradle build times like I did and it was an
> > improvement
> > > > over Maven or that you did not and you experienced build times that
> > were
> > > > equivalent to Maven?
> > > >
> > > > On Thu, Nov 9, 2017 at 9:51 AM, Romain Manni-Bucau <
> > > rmannibu...@gmail.com>
> > > > wrote:
> > > >
> > > >> 2017-11-09 18:38 GMT+01:00 Kenneth Knowles  >:
> > > >> > On Thu, Nov 9, 2017 at 9:11 AM, Romain Manni-Bucau <
> > > >> rmannibu...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> >> (this is another topic so we can maybe open another thread) issue
> > is
> > > >> >> not much about python but more about the fact the build is not
> self
> > > >> >> contained. it is a maven build and maven should be sufficient
> > without
> > > >> >> having to install python + dependencies.
> > > >> >
> > > >> >
> > > >> > Let's leave out the topic of whether our build should install
> things
> > > >> like
> > > >> > JDKs, Python, Golang, Docker, protoc, findbugs, RAT, etc. That
> issue
> > > is
> > > >> > somewhat independent of build tool, and the new build isn't worse
> > than
> > > >> the
> > > >> > old one as far as it goes.
> > > >>
> > > >>
> > > >> Yep, globally the same time with clean and killing the daemon.
> > > >>
> > > >> >
> > > >> > Kenn
> > > >> >
> > > >> >
> > > >> >
> > > >> >> I don't see any technical
> > > >> >> blockers to do it (except time ;)) but it is always a bit
> annoying
> > to
> > > >> >> git clone then not be able to build.
> > > >> >>
> > > >> >> Romain Manni-Bucau
> > > >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > > >> >>
> > > >> >>
> > > >> >> 2017-11-09 18:07 GMT+01:00 Lukasz Cwik  >:
> > > >> >> > Hmm, I have had good luck when following the Python quick start
> > > setup
> > > >> >> >  on
> multiple
> > > >> >> machines
> > > >> >> > by ensuring the installed version of setuptools, virtualenv and
> > pip
> > > >> are
> > > >> >> new
> > > >> >> > enough versions.
> > > >> >> >
> > > >> >> > You can always skip the Python portion of the build by
> excluding
> > > the
> > > >> >> build
> > > >> >> > task as so:
> > > >> >> > ./gradlew build -x ":beam-sdks-parent:beam-sdks-python:build"
> > > >> >> >
> > > >> >> > On Thu, Nov 9, 2017 at 8:58 AM, Romain Manni-Bucau <
> > > >> >> rmannibu...@gmail.com>
> > > >> >> > wrote:
> > > >> >> >
> > > >> >> >> The 1.3.5 file is when i installed the python dependencies
> > > manually
> > > >> >> >> to make the build passing (the pip command never passed on my
> > > >> computer
> > > >> >> >> and therefore the build always has been broken until i
> installed
> > > it
> > > >> >> >> manually - independently from the build tool).
> > > >> >> >>
> > > >> >> >> Romain Manni-Bucau
> > > >> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > > >> >> >>
> > > >> >> >>
> > > >> >> >> 2017-11-09 17:51 GMT+01:00 Lukasz Cwik
>  > > >:
> > > >> >> >> > It turns out 

Re: [VOTE] Release 2.2.0, release candidate #3

2017-11-09 Thread Valentyn Tymofieiev
https://github.com/apache/beam/pull/4109 is out to address both findings I
reported earlier.

On Thu, Nov 9, 2017 at 8:54 AM, Etienne Chauchot 
wrote:

> Just as a remark, I compared (on my laptop though) queries execution times
> on my previous run of 2.2.0-RC3 with release 2.1.0 and I did not see any
> performance regression.
>
> Best
>
> Etienne
>
>
> Le 09/11/2017 à 03:13, Valentyn Tymofieiev a écrit :
>
>> I looked at Python side of Dataflow & Direct runners on Linux. There are
>> two findings:
>>
>> 1. One of the mobile gaming examples did not pass for Dataflow runner,
>> addressed in: https://github.com/apache/beam/pull/4102
>> > che%2Fbeam%2Fpull%2F4102=D=1=AFQjCNF3OS6Oo-MeNET
>> CCmOxJj5Gm2uH6g>
>>
>> .
>>
>> 2. Python streaming did not work for Dataflow runner, one PR is out
>> https://github.com/apache/beam/pull/4106, but follow up PRs may be
>> required
>> as we continue to investigate. If we had a PostCommit tests suite running
>> against a release branch, this could have been caught earlier. Filed
>> https://issues.apache.org/jira/browse/BEAM-3163.
>>
>> On Wed, Nov 8, 2017 at 2:39 PM, Reuven Lax 
>> wrote:
>>
>> Hi everyone,
>>>
>>> Please review and vote on the release candidate #3 for the version 2.2.0,
>>> as follows:
>>>[ ] +1, Approve the release
>>>[ ] -1, Do not approve the release (please provide specific comments)
>>>
>>>
>>> The complete staging area is available for your review, which includes:
>>>* JIRA release notes [1],
>>>* the official Apache source release to be deployed to
>>> dist.apache.org
>>> [2],
>>> which is signed with the key with fingerprint B98B7708 [3],
>>>* all artifacts to be deployed to the Maven Central Repository [4],
>>>* source code tag "v2.2.0-RC3" [5],
>>>* website pull request listing the release and publishing the API
>>> reference manual [6].
>>>* Java artifacts were built with Maven 3.5.0 and OpenJDK/Oracle JDK
>>> 1.8.0_144.
>>>* Python artifacts are deployed along with the source release to the
>>> dist.apache.org [2].
>>>
>>> The vote will be open for at least 72 hours. It is adopted by majority
>>> approval, with at least 3 PMC affirmative votes.
>>>
>>> Thanks,
>>> Reuven
>>>
>>> [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?p
>>> rojectId=12319527=12341044
>>> [2] https://dist.apache.org/repos/dist/dev/beam/2.2.0/
>>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>>> [4] https://repository.apache.org/content/repositories/orgapache
>>> beam-1023/
>>> [5] https://github.com/apache/beam/tree/v2.2.0-RC3
>>> 
>>> [6] https://github.com/apache/beam-site/pull/337
>>>
>>>
>


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Romain Manni-Bucau
Le 9 nov. 2017 21:31, "Kenneth Knowles"  a écrit :

Keep in mind that a clean build is unusual during development (it is common
for mvn use and that is a bug) and also not necessary for precommits if the
build tool is correct enough that caching is safe. So while this number
matters, it is not the most important.


Not sure, in dev you bypass the build tool most of the time anyway - thanks
to IDE or other shortcuts - but not on PR and CI. Keep in mind that not
doing a clean and killing gradle daemon makes the build not reproducible
and therefore useful :(. Starting to build from a subpart of the reactor -
with the mentionned mvn plugin for instance - can be nice on some CI like
travis if the caching is well configured but still not a guarantee the
build is "green".

My trade off is to ensure an easy build and relevant result over the time
criteria. Do you share it as well or prefer time over other criteria -
which leads to other conclusions and options indeed and can make us not
understanding each other?


On Thu, Nov 9, 2017 at 11:30 AM, Romain Manni-Bucau 
wrote:

> I will try next week yes but the 2 runs i did were 28mn vs 32mn from
memory
> - after having downloaded all deps once.
>
> Le 9 nov. 2017 19:45, "Lukasz Cwik"  a écrit :
>
> > If Gradle was slow, do you mind running the build with --profile and
> > sharing that and also sharing the Maven build log?
> >
> > On Thu, Nov 9, 2017 at 10:43 AM, Lukasz Cwik  wrote:
> >
> > > Romain, I don't understand your last comment, were you trying to say
> that
> > > you had the same Gradle build times like I did and it was an
> improvement
> > > over Maven or that you did not and you experienced build times that
> were
> > > equivalent to Maven?
> > >
> > > On Thu, Nov 9, 2017 at 9:51 AM, Romain Manni-Bucau <
> > rmannibu...@gmail.com>
> > > wrote:
> > >
> > >> 2017-11-09 18:38 GMT+01:00 Kenneth Knowles :
> > >> > On Thu, Nov 9, 2017 at 9:11 AM, Romain Manni-Bucau <
> > >> rmannibu...@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> (this is another topic so we can maybe open another thread) issue
> is
> > >> >> not much about python but more about the fact the build is not
self
> > >> >> contained. it is a maven build and maven should be sufficient
> without
> > >> >> having to install python + dependencies.
> > >> >
> > >> >
> > >> > Let's leave out the topic of whether our build should install
things
> > >> like
> > >> > JDKs, Python, Golang, Docker, protoc, findbugs, RAT, etc. That
issue
> > is
> > >> > somewhat independent of build tool, and the new build isn't worse
> than
> > >> the
> > >> > old one as far as it goes.
> > >>
> > >>
> > >> Yep, globally the same time with clean and killing the daemon.
> > >>
> > >> >
> > >> > Kenn
> > >> >
> > >> >
> > >> >
> > >> >> I don't see any technical
> > >> >> blockers to do it (except time ;)) but it is always a bit annoying
> to
> > >> >> git clone then not be able to build.
> > >> >>
> > >> >> Romain Manni-Bucau
> > >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > >> >>
> > >> >>
> > >> >> 2017-11-09 18:07 GMT+01:00 Lukasz Cwik :
> > >> >> > Hmm, I have had good luck when following the Python quick start
> > setup
> > >> >> >  on multiple
> > >> >> machines
> > >> >> > by ensuring the installed version of setuptools, virtualenv and
> pip
> > >> are
> > >> >> new
> > >> >> > enough versions.
> > >> >> >
> > >> >> > You can always skip the Python portion of the build by excluding
> > the
> > >> >> build
> > >> >> > task as so:
> > >> >> > ./gradlew build -x ":beam-sdks-parent:beam-sdks-python:build"
> > >> >> >
> > >> >> > On Thu, Nov 9, 2017 at 8:58 AM, Romain Manni-Bucau <
> > >> >> rmannibu...@gmail.com>
> > >> >> > wrote:
> > >> >> >
> > >> >> >> The 1.3.5 file is when i installed the python dependencies
> > manually
> > >> >> >> to make the build passing (the pip command never passed on my
> > >> computer
> > >> >> >> and therefore the build always has been broken until i
installed
> > it
> > >> >> >> manually - independently from the build tool).
> > >> >> >>
> > >> >> >> Romain Manni-Bucau
> > >> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > >> >> >>
> > >> >> >>
> > >> >> >> 2017-11-09 17:51 GMT+01:00 Lukasz Cwik
 > >:
> > >> >> >> > It turns out that the Apache Rat Ant task and the Apache Rat
> > Maven
> > >> >> plugin
> > >> >> >> > differ in that the plugin automatically excludes certain
files
> > by
> > >> >> default
> > >> >> >> > while the Ant task does not.
> > >> >> >> > See:
> > >> >> >> > http://creadur.apache.org/rat/apache-rat-plugin/check-mojo.
> > >> >> >> html#useDefaultExcludes
> > >> >> >> >
> > >> >> >> > I fixed the list to exclude ".idea/" instead of "idea/" since
> > >> there
> > >> >> was a
> > >> >> >> > typo.
> > >> >> >> >
> > >> >> 

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Kenneth Knowles
Keep in mind that a clean build is unusual during development (it is common
for mvn use and that is a bug) and also not necessary for precommits if the
build tool is correct enough that caching is safe. So while this number
matters, it is not the most important.

On Thu, Nov 9, 2017 at 11:30 AM, Romain Manni-Bucau 
wrote:

> I will try next week yes but the 2 runs i did were 28mn vs 32mn from memory
> - after having downloaded all deps once.
>
> Le 9 nov. 2017 19:45, "Lukasz Cwik"  a écrit :
>
> > If Gradle was slow, do you mind running the build with --profile and
> > sharing that and also sharing the Maven build log?
> >
> > On Thu, Nov 9, 2017 at 10:43 AM, Lukasz Cwik  wrote:
> >
> > > Romain, I don't understand your last comment, were you trying to say
> that
> > > you had the same Gradle build times like I did and it was an
> improvement
> > > over Maven or that you did not and you experienced build times that
> were
> > > equivalent to Maven?
> > >
> > > On Thu, Nov 9, 2017 at 9:51 AM, Romain Manni-Bucau <
> > rmannibu...@gmail.com>
> > > wrote:
> > >
> > >> 2017-11-09 18:38 GMT+01:00 Kenneth Knowles :
> > >> > On Thu, Nov 9, 2017 at 9:11 AM, Romain Manni-Bucau <
> > >> rmannibu...@gmail.com>
> > >> > wrote:
> > >> >
> > >> >> (this is another topic so we can maybe open another thread) issue
> is
> > >> >> not much about python but more about the fact the build is not self
> > >> >> contained. it is a maven build and maven should be sufficient
> without
> > >> >> having to install python + dependencies.
> > >> >
> > >> >
> > >> > Let's leave out the topic of whether our build should install things
> > >> like
> > >> > JDKs, Python, Golang, Docker, protoc, findbugs, RAT, etc. That issue
> > is
> > >> > somewhat independent of build tool, and the new build isn't worse
> than
> > >> the
> > >> > old one as far as it goes.
> > >>
> > >>
> > >> Yep, globally the same time with clean and killing the daemon.
> > >>
> > >> >
> > >> > Kenn
> > >> >
> > >> >
> > >> >
> > >> >> I don't see any technical
> > >> >> blockers to do it (except time ;)) but it is always a bit annoying
> to
> > >> >> git clone then not be able to build.
> > >> >>
> > >> >> Romain Manni-Bucau
> > >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > >> >>
> > >> >>
> > >> >> 2017-11-09 18:07 GMT+01:00 Lukasz Cwik :
> > >> >> > Hmm, I have had good luck when following the Python quick start
> > setup
> > >> >> >  on multiple
> > >> >> machines
> > >> >> > by ensuring the installed version of setuptools, virtualenv and
> pip
> > >> are
> > >> >> new
> > >> >> > enough versions.
> > >> >> >
> > >> >> > You can always skip the Python portion of the build by excluding
> > the
> > >> >> build
> > >> >> > task as so:
> > >> >> > ./gradlew build -x ":beam-sdks-parent:beam-sdks-python:build"
> > >> >> >
> > >> >> > On Thu, Nov 9, 2017 at 8:58 AM, Romain Manni-Bucau <
> > >> >> rmannibu...@gmail.com>
> > >> >> > wrote:
> > >> >> >
> > >> >> >> The 1.3.5 file is when i installed the python dependencies
> > manually
> > >> >> >> to make the build passing (the pip command never passed on my
> > >> computer
> > >> >> >> and therefore the build always has been broken until i installed
> > it
> > >> >> >> manually - independently from the build tool).
> > >> >> >>
> > >> >> >> Romain Manni-Bucau
> > >> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> > >> >> >>
> > >> >> >>
> > >> >> >> 2017-11-09 17:51 GMT+01:00 Lukasz Cwik  > >:
> > >> >> >> > It turns out that the Apache Rat Ant task and the Apache Rat
> > Maven
> > >> >> plugin
> > >> >> >> > differ in that the plugin automatically excludes certain files
> > by
> > >> >> default
> > >> >> >> > while the Ant task does not.
> > >> >> >> > See:
> > >> >> >> > http://creadur.apache.org/rat/apache-rat-plugin/check-mojo.
> > >> >> >> html#useDefaultExcludes
> > >> >> >> >
> > >> >> >> > I fixed the list to exclude ".idea/" instead of "idea/" since
> > >> there
> > >> >> was a
> > >> >> >> > typo.
> > >> >> >> >
> > >> >> >> > I have no idea what the file "=1.3.5" is. Can you take a look
> at
> > >> the
> > >> >> >> > contents?
> > >> >> >> >
> > >> >> >> > On Thu, Nov 9, 2017 at 12:03 AM, Romain Manni-Bucau <
> > >> >> >> rmannibu...@gmail.com>
> > >> >> >> > wrote:
> > >> >> >> >
> > >> >> >> >> Ok, the rat issues I got were:
> > >> >> >> >>
> > >> >> >> >> == File: /home/rmannibucau/1_dev/beam/.idea/*
> > >> >> >> >> == File: /home/rmannibucau/1_dev/beam/sdks/python/=1.3.5
> > >> >> >> >>
> > >> >> >> >> The first one could be in my default exclude - even if
> > >> eclipse/idea
> > >> >> >> >> files should be in the default exclude set of beam rat config
> > >> IMHO,
> > >> >> >> >> the last one is more a "?" can probably be exclude as well if
> > >> created
> > >> >> >> >> by the build 

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Romain Manni-Bucau
I will try next week yes but the 2 runs i did were 28mn vs 32mn from memory
- after having downloaded all deps once.

Le 9 nov. 2017 19:45, "Lukasz Cwik"  a écrit :

> If Gradle was slow, do you mind running the build with --profile and
> sharing that and also sharing the Maven build log?
>
> On Thu, Nov 9, 2017 at 10:43 AM, Lukasz Cwik  wrote:
>
> > Romain, I don't understand your last comment, were you trying to say that
> > you had the same Gradle build times like I did and it was an improvement
> > over Maven or that you did not and you experienced build times that were
> > equivalent to Maven?
> >
> > On Thu, Nov 9, 2017 at 9:51 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >
> >> 2017-11-09 18:38 GMT+01:00 Kenneth Knowles :
> >> > On Thu, Nov 9, 2017 at 9:11 AM, Romain Manni-Bucau <
> >> rmannibu...@gmail.com>
> >> > wrote:
> >> >
> >> >> (this is another topic so we can maybe open another thread) issue is
> >> >> not much about python but more about the fact the build is not self
> >> >> contained. it is a maven build and maven should be sufficient without
> >> >> having to install python + dependencies.
> >> >
> >> >
> >> > Let's leave out the topic of whether our build should install things
> >> like
> >> > JDKs, Python, Golang, Docker, protoc, findbugs, RAT, etc. That issue
> is
> >> > somewhat independent of build tool, and the new build isn't worse than
> >> the
> >> > old one as far as it goes.
> >>
> >>
> >> Yep, globally the same time with clean and killing the daemon.
> >>
> >> >
> >> > Kenn
> >> >
> >> >
> >> >
> >> >> I don't see any technical
> >> >> blockers to do it (except time ;)) but it is always a bit annoying to
> >> >> git clone then not be able to build.
> >> >>
> >> >> Romain Manni-Bucau
> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >> >>
> >> >>
> >> >> 2017-11-09 18:07 GMT+01:00 Lukasz Cwik :
> >> >> > Hmm, I have had good luck when following the Python quick start
> setup
> >> >> >  on multiple
> >> >> machines
> >> >> > by ensuring the installed version of setuptools, virtualenv and pip
> >> are
> >> >> new
> >> >> > enough versions.
> >> >> >
> >> >> > You can always skip the Python portion of the build by excluding
> the
> >> >> build
> >> >> > task as so:
> >> >> > ./gradlew build -x ":beam-sdks-parent:beam-sdks-python:build"
> >> >> >
> >> >> > On Thu, Nov 9, 2017 at 8:58 AM, Romain Manni-Bucau <
> >> >> rmannibu...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> The 1.3.5 file is when i installed the python dependencies
> manually
> >> >> >> to make the build passing (the pip command never passed on my
> >> computer
> >> >> >> and therefore the build always has been broken until i installed
> it
> >> >> >> manually - independently from the build tool).
> >> >> >>
> >> >> >> Romain Manni-Bucau
> >> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >> >> >>
> >> >> >>
> >> >> >> 2017-11-09 17:51 GMT+01:00 Lukasz Cwik  >:
> >> >> >> > It turns out that the Apache Rat Ant task and the Apache Rat
> Maven
> >> >> plugin
> >> >> >> > differ in that the plugin automatically excludes certain files
> by
> >> >> default
> >> >> >> > while the Ant task does not.
> >> >> >> > See:
> >> >> >> > http://creadur.apache.org/rat/apache-rat-plugin/check-mojo.
> >> >> >> html#useDefaultExcludes
> >> >> >> >
> >> >> >> > I fixed the list to exclude ".idea/" instead of "idea/" since
> >> there
> >> >> was a
> >> >> >> > typo.
> >> >> >> >
> >> >> >> > I have no idea what the file "=1.3.5" is. Can you take a look at
> >> the
> >> >> >> > contents?
> >> >> >> >
> >> >> >> > On Thu, Nov 9, 2017 at 12:03 AM, Romain Manni-Bucau <
> >> >> >> rmannibu...@gmail.com>
> >> >> >> > wrote:
> >> >> >> >
> >> >> >> >> Ok, the rat issues I got were:
> >> >> >> >>
> >> >> >> >> == File: /home/rmannibucau/1_dev/beam/.idea/*
> >> >> >> >> == File: /home/rmannibucau/1_dev/beam/sdks/python/=1.3.5
> >> >> >> >>
> >> >> >> >> The first one could be in my default exclude - even if
> >> eclipse/idea
> >> >> >> >> files should be in the default exclude set of beam rat config
> >> IMHO,
> >> >> >> >> the last one is more a "?" can probably be exclude as well if
> >> created
> >> >> >> >> by the build at some point.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> Romain Manni-Bucau
> >> >> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> 2017-11-08 19:17 GMT+01:00 Jean-Baptiste Onofré <
> j...@nanthrax.net
> >> >:
> >> >> >> >> > Thanks for the update. I was swamped on some meetings. I'm
> >> back to
> >> >> >> test
> >> >> >> >> the latest changes.
> >> >> >> >> >
> >> >> >> >> > Regards
> >> >> >> >> > JB
> >> >> >> >> >
> >> >> >> >> > On Nov 8, 2017, 18:56, at 18:56, Lukasz Cwik
> >> >>  >> >> >> >
> >> >> >> >> wrote:
> >> >> 

Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

2017-11-09 Thread Kenneth Knowles
+1 from me, with a friendly deprecation process

I am convinced by the following:

 - We don't have the resources to make both great, and anyhow it isn't
worth it
 - People keeping up with Beam releases are likely to be keeping up with
Spark as well
 - Spark 1 users already have a Spark 1 runner for Beam and can keep using
it (and we don't actually lose the ability to update it in a pinch)
 - Key features like portability (hence Python) will be some time so we
should definitely not waste effort building that feature with Spark 1 in
mind

I think it makes sense to communicate with email to users@ and in the
release notes of 2.2.0. That communication should be specific and indicate
whether we are planning to merely not work on it anymore or actually remove
it in 2.3.0.

Kenn

On Thu, Nov 9, 2017 at 6:35 AM, Amit Sela  wrote:

> +1 for dropping Spark 1 support.
> I don't think we have enough users to justify supporting both, and its been
> a long time since this idea originally came-up (when Spark2 wasn't stable)
> and now Spark 2 is standard in all Hadoop distros.
> As for switching to the Dataframe API, as long as Spark 2 doesn't support
> scanning through the state periodically (even if no data for a key),
> watermarks won't fire keys that didn't see updates.
>
> On Thu, Nov 9, 2017 at 9:12 AM Thomas Weise  wrote:
>
> > +1 (non-binding) for dropping 1.x support
> >
> > I don't have the impression that there is significant adoption for Beam
> on
> > Spark 1.x ? A stronger Spark runner that works well on 2.x will be better
> > for Beam adoption than a runner that has to compromise due to 1.x
> baggage.
> > Development efforts can go into improving the runner.
> >
> > Thanks,
> > Thomas
> >
> >
> > On Thu, Nov 9, 2017 at 4:08 AM, Srinivas Reddy <
> srinivas96all...@gmail.com
> > >
> > wrote:
> >
> > > +1
> > >
> > >
> > >
> > > --
> > > Srinivas Reddy
> > >
> > > http://mrsrinivas.com/
> > >
> > >
> > > (Sent via gmail web)
> > >
> > > On 8 November 2017 at 14:27, Jean-Baptiste Onofré 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > as you might know, we are working on Spark 2.x support in the Spark
> > > runner.
> > > >
> > > > I'm working on a PR about that:
> > > >
> > > > https://github.com/apache/beam/pull/3808
> > > >
> > > > Today, we have something working with both Spark 1.x and 2.x from a
> > code
> > > > standpoint, but I have to deal with dependencies. It's the first step
> > of
> > > > the update as I'm still using RDD, the second step would be to
> support
> > > > dataframe (but for that, I would need PCollection elements with
> > schemas,
> > > > that's another topic on which Eugene, Reuven and I are discussing).
> > > >
> > > > However, as all major distributions now ship Spark 2.x, I don't think
> > > it's
> > > > required anymore to support Spark 1.x.
> > > >
> > > > If we agree, I will update and cleanup the PR to only support and
> focus
> > > on
> > > > Spark 2.x.
> > > >
> > > > So, that's why I'm calling for a vote:
> > > >
> > > >   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
> > > >   [ ] 0 (I don't care ;))
> > > >   [ ] -1, I would like to still support Spark 1.x, and so having
> > support
> > > > of both Spark 1.x and 2.x (please provide specific comment)
> > > >
> > > > This vote is open for 48 hours (I have the commits ready, just
> waiting
> > > the
> > > > end of the vote to push on the PR).
> > > >
> > > > Thanks !
> > > > Regards
> > > > JB
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbono...@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> >
>


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Lukasz Cwik
Romain, I don't understand your last comment, were you trying to say that
you had the same Gradle build times like I did and it was an improvement
over Maven or that you did not and you experienced build times that were
equivalent to Maven?

On Thu, Nov 9, 2017 at 9:51 AM, Romain Manni-Bucau 
wrote:

> 2017-11-09 18:38 GMT+01:00 Kenneth Knowles :
> > On Thu, Nov 9, 2017 at 9:11 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >
> >> (this is another topic so we can maybe open another thread) issue is
> >> not much about python but more about the fact the build is not self
> >> contained. it is a maven build and maven should be sufficient without
> >> having to install python + dependencies.
> >
> >
> > Let's leave out the topic of whether our build should install things like
> > JDKs, Python, Golang, Docker, protoc, findbugs, RAT, etc. That issue is
> > somewhat independent of build tool, and the new build isn't worse than
> the
> > old one as far as it goes.
>
>
> Yep, globally the same time with clean and killing the daemon.
>
> >
> > Kenn
> >
> >
> >
> >> I don't see any technical
> >> blockers to do it (except time ;)) but it is always a bit annoying to
> >> git clone then not be able to build.
> >>
> >> Romain Manni-Bucau
> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >>
> >>
> >> 2017-11-09 18:07 GMT+01:00 Lukasz Cwik :
> >> > Hmm, I have had good luck when following the Python quick start setup
> >> >  on multiple
> >> machines
> >> > by ensuring the installed version of setuptools, virtualenv and pip
> are
> >> new
> >> > enough versions.
> >> >
> >> > You can always skip the Python portion of the build by excluding the
> >> build
> >> > task as so:
> >> > ./gradlew build -x ":beam-sdks-parent:beam-sdks-python:build"
> >> >
> >> > On Thu, Nov 9, 2017 at 8:58 AM, Romain Manni-Bucau <
> >> rmannibu...@gmail.com>
> >> > wrote:
> >> >
> >> >> The 1.3.5 file is when i installed the python dependencies manually
> >> >> to make the build passing (the pip command never passed on my
> computer
> >> >> and therefore the build always has been broken until i installed it
> >> >> manually - independently from the build tool).
> >> >>
> >> >> Romain Manni-Bucau
> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >> >>
> >> >>
> >> >> 2017-11-09 17:51 GMT+01:00 Lukasz Cwik :
> >> >> > It turns out that the Apache Rat Ant task and the Apache Rat Maven
> >> plugin
> >> >> > differ in that the plugin automatically excludes certain files by
> >> default
> >> >> > while the Ant task does not.
> >> >> > See:
> >> >> > http://creadur.apache.org/rat/apache-rat-plugin/check-mojo.
> >> >> html#useDefaultExcludes
> >> >> >
> >> >> > I fixed the list to exclude ".idea/" instead of "idea/" since there
> >> was a
> >> >> > typo.
> >> >> >
> >> >> > I have no idea what the file "=1.3.5" is. Can you take a look at
> the
> >> >> > contents?
> >> >> >
> >> >> > On Thu, Nov 9, 2017 at 12:03 AM, Romain Manni-Bucau <
> >> >> rmannibu...@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> >> Ok, the rat issues I got were:
> >> >> >>
> >> >> >> == File: /home/rmannibucau/1_dev/beam/.idea/*
> >> >> >> == File: /home/rmannibucau/1_dev/beam/sdks/python/=1.3.5
> >> >> >>
> >> >> >> The first one could be in my default exclude - even if
> eclipse/idea
> >> >> >> files should be in the default exclude set of beam rat config
> IMHO,
> >> >> >> the last one is more a "?" can probably be exclude as well if
> created
> >> >> >> by the build at some point.
> >> >> >>
> >> >> >>
> >> >> >> Romain Manni-Bucau
> >> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >> >> >>
> >> >> >>
> >> >> >> 2017-11-08 19:17 GMT+01:00 Jean-Baptiste Onofré  >:
> >> >> >> > Thanks for the update. I was swamped on some meetings. I'm back
> to
> >> >> test
> >> >> >> the latest changes.
> >> >> >> >
> >> >> >> > Regards
> >> >> >> > JB
> >> >> >> >
> >> >> >> > On Nov 8, 2017, 18:56, at 18:56, Lukasz Cwik
> >>  >> >> >
> >> >> >> wrote:
> >> >> >> >>Thanks everyone for trying this build out in different
> workspaces /
> >> >> >> >>configurations. This will help make sure the build works for
> more
> >> >> >> >>people
> >> >> >> >>and will get rid of any rough edges.
> >> >> >> >>
> >> >> >> >>Performance (All):
> >> >> >> >>Maven performs parallelization at the module level, an entire
> >> module
> >> >> >> >>needs
> >> >> >> >>to complete before any dependent modules can start, this means
> >> running
> >> >> >> >>all
> >> >> >> >>the checks like findbugs, checkstyle, tests need to finish.
> Gradle
> >> has
> >> >> >> >>task
> >> >> >> >>level parallelism between subprojects which means that as soon
> as
> >> the
> >> >> >> >>compile and shade steps are done for a project, and dependent
> >> >> >> >>subprojects
> >> >> >> >>can 

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Romain Manni-Bucau
2017-11-09 18:38 GMT+01:00 Kenneth Knowles :
> On Thu, Nov 9, 2017 at 9:11 AM, Romain Manni-Bucau 
> wrote:
>
>> (this is another topic so we can maybe open another thread) issue is
>> not much about python but more about the fact the build is not self
>> contained. it is a maven build and maven should be sufficient without
>> having to install python + dependencies.
>
>
> Let's leave out the topic of whether our build should install things like
> JDKs, Python, Golang, Docker, protoc, findbugs, RAT, etc. That issue is
> somewhat independent of build tool, and the new build isn't worse than the
> old one as far as it goes.


Yep, globally the same time with clean and killing the daemon.

>
> Kenn
>
>
>
>> I don't see any technical
>> blockers to do it (except time ;)) but it is always a bit annoying to
>> git clone then not be able to build.
>>
>> Romain Manni-Bucau
>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>>
>>
>> 2017-11-09 18:07 GMT+01:00 Lukasz Cwik :
>> > Hmm, I have had good luck when following the Python quick start setup
>> >  on multiple
>> machines
>> > by ensuring the installed version of setuptools, virtualenv and pip are
>> new
>> > enough versions.
>> >
>> > You can always skip the Python portion of the build by excluding the
>> build
>> > task as so:
>> > ./gradlew build -x ":beam-sdks-parent:beam-sdks-python:build"
>> >
>> > On Thu, Nov 9, 2017 at 8:58 AM, Romain Manni-Bucau <
>> rmannibu...@gmail.com>
>> > wrote:
>> >
>> >> The 1.3.5 file is when i installed the python dependencies manually
>> >> to make the build passing (the pip command never passed on my computer
>> >> and therefore the build always has been broken until i installed it
>> >> manually - independently from the build tool).
>> >>
>> >> Romain Manni-Bucau
>> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>> >>
>> >>
>> >> 2017-11-09 17:51 GMT+01:00 Lukasz Cwik :
>> >> > It turns out that the Apache Rat Ant task and the Apache Rat Maven
>> plugin
>> >> > differ in that the plugin automatically excludes certain files by
>> default
>> >> > while the Ant task does not.
>> >> > See:
>> >> > http://creadur.apache.org/rat/apache-rat-plugin/check-mojo.
>> >> html#useDefaultExcludes
>> >> >
>> >> > I fixed the list to exclude ".idea/" instead of "idea/" since there
>> was a
>> >> > typo.
>> >> >
>> >> > I have no idea what the file "=1.3.5" is. Can you take a look at the
>> >> > contents?
>> >> >
>> >> > On Thu, Nov 9, 2017 at 12:03 AM, Romain Manni-Bucau <
>> >> rmannibu...@gmail.com>
>> >> > wrote:
>> >> >
>> >> >> Ok, the rat issues I got were:
>> >> >>
>> >> >> == File: /home/rmannibucau/1_dev/beam/.idea/*
>> >> >> == File: /home/rmannibucau/1_dev/beam/sdks/python/=1.3.5
>> >> >>
>> >> >> The first one could be in my default exclude - even if eclipse/idea
>> >> >> files should be in the default exclude set of beam rat config IMHO,
>> >> >> the last one is more a "?" can probably be exclude as well if created
>> >> >> by the build at some point.
>> >> >>
>> >> >>
>> >> >> Romain Manni-Bucau
>> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>> >> >>
>> >> >>
>> >> >> 2017-11-08 19:17 GMT+01:00 Jean-Baptiste Onofré :
>> >> >> > Thanks for the update. I was swamped on some meetings. I'm back to
>> >> test
>> >> >> the latest changes.
>> >> >> >
>> >> >> > Regards
>> >> >> > JB
>> >> >> >
>> >> >> > On Nov 8, 2017, 18:56, at 18:56, Lukasz Cwik
>> > >> >
>> >> >> wrote:
>> >> >> >>Thanks everyone for trying this build out in different workspaces /
>> >> >> >>configurations. This will help make sure the build works for more
>> >> >> >>people
>> >> >> >>and will get rid of any rough edges.
>> >> >> >>
>> >> >> >>Performance (All):
>> >> >> >>Maven performs parallelization at the module level, an entire
>> module
>> >> >> >>needs
>> >> >> >>to complete before any dependent modules can start, this means
>> running
>> >> >> >>all
>> >> >> >>the checks like findbugs, checkstyle, tests need to finish. Gradle
>> has
>> >> >> >>task
>> >> >> >>level parallelism between subprojects which means that as soon as
>> the
>> >> >> >>compile and shade steps are done for a project, and dependent
>> >> >> >>subprojects
>> >> >> >>can typically start. This means that we get increased parallelism
>> due
>> >> >> >>to
>> >> >> >>not needing to wait for findbugs, checkstyle, tests to run. I
>> >> typically
>> >> >> >>see
>> >> >> >>~20 tasks (at peak) running on my desktop in parallel.
>> >> >> >>
>> >> >> >>Apache Rat (JB / Romain):
>> >> >> >>What files are in the rat report that fail (its likely that I'm
>> >> missing
>> >> >> >>some exclusion for a build time artifact)? Also, please try the
>> build
>> >> >> >>again
>> >> >> >>after running `git clean -fdx` in your workspace.
>> >> >> >>
>> >> >> >>Python (JB):
>> >> >> >>As 

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Kenneth Knowles
On Thu, Nov 9, 2017 at 9:11 AM, Romain Manni-Bucau 
wrote:

> (this is another topic so we can maybe open another thread) issue is
> not much about python but more about the fact the build is not self
> contained. it is a maven build and maven should be sufficient without
> having to install python + dependencies.


Let's leave out the topic of whether our build should install things like
JDKs, Python, Golang, Docker, protoc, findbugs, RAT, etc. That issue is
somewhat independent of build tool, and the new build isn't worse than the
old one as far as it goes.

Kenn



> I don't see any technical
> blockers to do it (except time ;)) but it is always a bit annoying to
> git clone then not be able to build.
>
> Romain Manni-Bucau
> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>
>
> 2017-11-09 18:07 GMT+01:00 Lukasz Cwik :
> > Hmm, I have had good luck when following the Python quick start setup
> >  on multiple
> machines
> > by ensuring the installed version of setuptools, virtualenv and pip are
> new
> > enough versions.
> >
> > You can always skip the Python portion of the build by excluding the
> build
> > task as so:
> > ./gradlew build -x ":beam-sdks-parent:beam-sdks-python:build"
> >
> > On Thu, Nov 9, 2017 at 8:58 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >
> >> The 1.3.5 file is when i installed the python dependencies manually
> >> to make the build passing (the pip command never passed on my computer
> >> and therefore the build always has been broken until i installed it
> >> manually - independently from the build tool).
> >>
> >> Romain Manni-Bucau
> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >>
> >>
> >> 2017-11-09 17:51 GMT+01:00 Lukasz Cwik :
> >> > It turns out that the Apache Rat Ant task and the Apache Rat Maven
> plugin
> >> > differ in that the plugin automatically excludes certain files by
> default
> >> > while the Ant task does not.
> >> > See:
> >> > http://creadur.apache.org/rat/apache-rat-plugin/check-mojo.
> >> html#useDefaultExcludes
> >> >
> >> > I fixed the list to exclude ".idea/" instead of "idea/" since there
> was a
> >> > typo.
> >> >
> >> > I have no idea what the file "=1.3.5" is. Can you take a look at the
> >> > contents?
> >> >
> >> > On Thu, Nov 9, 2017 at 12:03 AM, Romain Manni-Bucau <
> >> rmannibu...@gmail.com>
> >> > wrote:
> >> >
> >> >> Ok, the rat issues I got were:
> >> >>
> >> >> == File: /home/rmannibucau/1_dev/beam/.idea/*
> >> >> == File: /home/rmannibucau/1_dev/beam/sdks/python/=1.3.5
> >> >>
> >> >> The first one could be in my default exclude - even if eclipse/idea
> >> >> files should be in the default exclude set of beam rat config IMHO,
> >> >> the last one is more a "?" can probably be exclude as well if created
> >> >> by the build at some point.
> >> >>
> >> >>
> >> >> Romain Manni-Bucau
> >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >> >>
> >> >>
> >> >> 2017-11-08 19:17 GMT+01:00 Jean-Baptiste Onofré :
> >> >> > Thanks for the update. I was swamped on some meetings. I'm back to
> >> test
> >> >> the latest changes.
> >> >> >
> >> >> > Regards
> >> >> > JB
> >> >> >
> >> >> > On Nov 8, 2017, 18:56, at 18:56, Lukasz Cwik
>  >> >
> >> >> wrote:
> >> >> >>Thanks everyone for trying this build out in different workspaces /
> >> >> >>configurations. This will help make sure the build works for more
> >> >> >>people
> >> >> >>and will get rid of any rough edges.
> >> >> >>
> >> >> >>Performance (All):
> >> >> >>Maven performs parallelization at the module level, an entire
> module
> >> >> >>needs
> >> >> >>to complete before any dependent modules can start, this means
> running
> >> >> >>all
> >> >> >>the checks like findbugs, checkstyle, tests need to finish. Gradle
> has
> >> >> >>task
> >> >> >>level parallelism between subprojects which means that as soon as
> the
> >> >> >>compile and shade steps are done for a project, and dependent
> >> >> >>subprojects
> >> >> >>can typically start. This means that we get increased parallelism
> due
> >> >> >>to
> >> >> >>not needing to wait for findbugs, checkstyle, tests to run. I
> >> typically
> >> >> >>see
> >> >> >>~20 tasks (at peak) running on my desktop in parallel.
> >> >> >>
> >> >> >>Apache Rat (JB / Romain):
> >> >> >>What files are in the rat report that fail (its likely that I'm
> >> missing
> >> >> >>some exclusion for a build time artifact)? Also, please try the
> build
> >> >> >>again
> >> >> >>after running `git clean -fdx` in your workspace.
> >> >> >>
> >> >> >>Python (JB):
> >> >> >>As for the Python SDK, you'll need to share more details about the
> >> >> >>failure.
> >> >> >>
> >> >> >>Gradle 4.3:
> >> >> >>I would like to defer the swap to Gradle 4.3 until after this PR
> since
> >> >> >>it
> >> >> >>will be a much smaller set of changes.
> >> >> >>
> >> >> 

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Romain Manni-Bucau
(this is another topic so we can maybe open another thread) issue is
not much about python but more about the fact the build is not self
contained. it is a maven build and maven should be sufficient without
having to install python + dependencies. I don't see any technical
blockers to do it (except time ;)) but it is always a bit annoying to
git clone then not be able to build.

Romain Manni-Bucau
@rmannibucau |  Blog | Old Blog | Github | LinkedIn


2017-11-09 18:07 GMT+01:00 Lukasz Cwik :
> Hmm, I have had good luck when following the Python quick start setup
>  on multiple machines
> by ensuring the installed version of setuptools, virtualenv and pip are new
> enough versions.
>
> You can always skip the Python portion of the build by excluding the build
> task as so:
> ./gradlew build -x ":beam-sdks-parent:beam-sdks-python:build"
>
> On Thu, Nov 9, 2017 at 8:58 AM, Romain Manni-Bucau 
> wrote:
>
>> The 1.3.5 file is when i installed the python dependencies manually
>> to make the build passing (the pip command never passed on my computer
>> and therefore the build always has been broken until i installed it
>> manually - independently from the build tool).
>>
>> Romain Manni-Bucau
>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>>
>>
>> 2017-11-09 17:51 GMT+01:00 Lukasz Cwik :
>> > It turns out that the Apache Rat Ant task and the Apache Rat Maven plugin
>> > differ in that the plugin automatically excludes certain files by default
>> > while the Ant task does not.
>> > See:
>> > http://creadur.apache.org/rat/apache-rat-plugin/check-mojo.
>> html#useDefaultExcludes
>> >
>> > I fixed the list to exclude ".idea/" instead of "idea/" since there was a
>> > typo.
>> >
>> > I have no idea what the file "=1.3.5" is. Can you take a look at the
>> > contents?
>> >
>> > On Thu, Nov 9, 2017 at 12:03 AM, Romain Manni-Bucau <
>> rmannibu...@gmail.com>
>> > wrote:
>> >
>> >> Ok, the rat issues I got were:
>> >>
>> >> == File: /home/rmannibucau/1_dev/beam/.idea/*
>> >> == File: /home/rmannibucau/1_dev/beam/sdks/python/=1.3.5
>> >>
>> >> The first one could be in my default exclude - even if eclipse/idea
>> >> files should be in the default exclude set of beam rat config IMHO,
>> >> the last one is more a "?" can probably be exclude as well if created
>> >> by the build at some point.
>> >>
>> >>
>> >> Romain Manni-Bucau
>> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>> >>
>> >>
>> >> 2017-11-08 19:17 GMT+01:00 Jean-Baptiste Onofré :
>> >> > Thanks for the update. I was swamped on some meetings. I'm back to
>> test
>> >> the latest changes.
>> >> >
>> >> > Regards
>> >> > JB
>> >> >
>> >> > On Nov 8, 2017, 18:56, at 18:56, Lukasz Cwik > >
>> >> wrote:
>> >> >>Thanks everyone for trying this build out in different workspaces /
>> >> >>configurations. This will help make sure the build works for more
>> >> >>people
>> >> >>and will get rid of any rough edges.
>> >> >>
>> >> >>Performance (All):
>> >> >>Maven performs parallelization at the module level, an entire module
>> >> >>needs
>> >> >>to complete before any dependent modules can start, this means running
>> >> >>all
>> >> >>the checks like findbugs, checkstyle, tests need to finish. Gradle has
>> >> >>task
>> >> >>level parallelism between subprojects which means that as soon as the
>> >> >>compile and shade steps are done for a project, and dependent
>> >> >>subprojects
>> >> >>can typically start. This means that we get increased parallelism due
>> >> >>to
>> >> >>not needing to wait for findbugs, checkstyle, tests to run. I
>> typically
>> >> >>see
>> >> >>~20 tasks (at peak) running on my desktop in parallel.
>> >> >>
>> >> >>Apache Rat (JB / Romain):
>> >> >>What files are in the rat report that fail (its likely that I'm
>> missing
>> >> >>some exclusion for a build time artifact)? Also, please try the build
>> >> >>again
>> >> >>after running `git clean -fdx` in your workspace.
>> >> >>
>> >> >>Python (JB):
>> >> >>As for the Python SDK, you'll need to share more details about the
>> >> >>failure.
>> >> >>
>> >> >>Gradle 4.3:
>> >> >>I would like to defer the swap to Gradle 4.3 until after this PR since
>> >> >>it
>> >> >>will be a much smaller set of changes.
>> >> >>
>> >> >>On Wed, Nov 8, 2017 at 12:54 AM, Jean-Baptiste Onofré <
>> j...@nanthrax.net>
>> >> >>wrote:
>> >> >>
>> >> >>> Same for me for rat and python build too:
>> >> >>>
>> >> >>> FAILURE: Build completed with 2 failures.
>> >> >>>
>> >> >>> 1: Task failed with an exception.
>> >> >>> ---
>> >> >>> * What went wrong:
>> >> >>> Execution failed for task ':rat'.
>> >> >>> > Found 905 files with unapproved/unknown licenses. See
>> >> >>> file:/home/jbonofre/Workspace/beam/build/reports/rat/rat-report.txt
>> >> >>>
>> >> >>> * Try:
>> >> >>> Run with --stacktrace option to get the stack trace. Run 

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Lukasz Cwik
Hmm, I have had good luck when following the Python quick start setup
 on multiple machines
by ensuring the installed version of setuptools, virtualenv and pip are new
enough versions.

You can always skip the Python portion of the build by excluding the build
task as so:
./gradlew build -x ":beam-sdks-parent:beam-sdks-python:build"

On Thu, Nov 9, 2017 at 8:58 AM, Romain Manni-Bucau 
wrote:

> The 1.3.5 file is when i installed the python dependencies manually
> to make the build passing (the pip command never passed on my computer
> and therefore the build always has been broken until i installed it
> manually - independently from the build tool).
>
> Romain Manni-Bucau
> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>
>
> 2017-11-09 17:51 GMT+01:00 Lukasz Cwik :
> > It turns out that the Apache Rat Ant task and the Apache Rat Maven plugin
> > differ in that the plugin automatically excludes certain files by default
> > while the Ant task does not.
> > See:
> > http://creadur.apache.org/rat/apache-rat-plugin/check-mojo.
> html#useDefaultExcludes
> >
> > I fixed the list to exclude ".idea/" instead of "idea/" since there was a
> > typo.
> >
> > I have no idea what the file "=1.3.5" is. Can you take a look at the
> > contents?
> >
> > On Thu, Nov 9, 2017 at 12:03 AM, Romain Manni-Bucau <
> rmannibu...@gmail.com>
> > wrote:
> >
> >> Ok, the rat issues I got were:
> >>
> >> == File: /home/rmannibucau/1_dev/beam/.idea/*
> >> == File: /home/rmannibucau/1_dev/beam/sdks/python/=1.3.5
> >>
> >> The first one could be in my default exclude - even if eclipse/idea
> >> files should be in the default exclude set of beam rat config IMHO,
> >> the last one is more a "?" can probably be exclude as well if created
> >> by the build at some point.
> >>
> >>
> >> Romain Manni-Bucau
> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> >>
> >>
> >> 2017-11-08 19:17 GMT+01:00 Jean-Baptiste Onofré :
> >> > Thanks for the update. I was swamped on some meetings. I'm back to
> test
> >> the latest changes.
> >> >
> >> > Regards
> >> > JB
> >> >
> >> > On Nov 8, 2017, 18:56, at 18:56, Lukasz Cwik  >
> >> wrote:
> >> >>Thanks everyone for trying this build out in different workspaces /
> >> >>configurations. This will help make sure the build works for more
> >> >>people
> >> >>and will get rid of any rough edges.
> >> >>
> >> >>Performance (All):
> >> >>Maven performs parallelization at the module level, an entire module
> >> >>needs
> >> >>to complete before any dependent modules can start, this means running
> >> >>all
> >> >>the checks like findbugs, checkstyle, tests need to finish. Gradle has
> >> >>task
> >> >>level parallelism between subprojects which means that as soon as the
> >> >>compile and shade steps are done for a project, and dependent
> >> >>subprojects
> >> >>can typically start. This means that we get increased parallelism due
> >> >>to
> >> >>not needing to wait for findbugs, checkstyle, tests to run. I
> typically
> >> >>see
> >> >>~20 tasks (at peak) running on my desktop in parallel.
> >> >>
> >> >>Apache Rat (JB / Romain):
> >> >>What files are in the rat report that fail (its likely that I'm
> missing
> >> >>some exclusion for a build time artifact)? Also, please try the build
> >> >>again
> >> >>after running `git clean -fdx` in your workspace.
> >> >>
> >> >>Python (JB):
> >> >>As for the Python SDK, you'll need to share more details about the
> >> >>failure.
> >> >>
> >> >>Gradle 4.3:
> >> >>I would like to defer the swap to Gradle 4.3 until after this PR since
> >> >>it
> >> >>will be a much smaller set of changes.
> >> >>
> >> >>On Wed, Nov 8, 2017 at 12:54 AM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> >> >>wrote:
> >> >>
> >> >>> Same for me for rat and python build too:
> >> >>>
> >> >>> FAILURE: Build completed with 2 failures.
> >> >>>
> >> >>> 1: Task failed with an exception.
> >> >>> ---
> >> >>> * What went wrong:
> >> >>> Execution failed for task ':rat'.
> >> >>> > Found 905 files with unapproved/unknown licenses. See
> >> >>> file:/home/jbonofre/Workspace/beam/build/reports/rat/rat-report.txt
> >> >>>
> >> >>> * Try:
> >> >>> Run with --stacktrace option to get the stack trace. Run with --info
> >> >>or
> >> >>> --debug option to get more log output.
> >> >>> 
> >> >>> ==
> >> >>>
> >> >>> 2: Task failed with an exception.
> >> >>> ---
> >> >>> * Where:
> >> >>> Build file '/home/jbonofre/Workspace/beam/sdks/python/build.gradle'
> >> >>line:
> >> >>> 64
> >> >>>
> >> >>> * What went wrong:
> >> >>> Execution failed for task ':beam-sdks-parent:beam-sdks-
> python:lint'.
> >> >>> > Process 'command 'tox'' finished with non-zero exit value 1
> >> >>>
> >> >>>
> >> >>>
> >> >>> On 11/08/2017 09:51 AM, Romain Manni-Bucau wrote:
> >> >>>
> >> 

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Romain Manni-Bucau
The 1.3.5 file is when i installed the python dependencies manually
to make the build passing (the pip command never passed on my computer
and therefore the build always has been broken until i installed it
manually - independently from the build tool).

Romain Manni-Bucau
@rmannibucau |  Blog | Old Blog | Github | LinkedIn


2017-11-09 17:51 GMT+01:00 Lukasz Cwik :
> It turns out that the Apache Rat Ant task and the Apache Rat Maven plugin
> differ in that the plugin automatically excludes certain files by default
> while the Ant task does not.
> See:
> http://creadur.apache.org/rat/apache-rat-plugin/check-mojo.html#useDefaultExcludes
>
> I fixed the list to exclude ".idea/" instead of "idea/" since there was a
> typo.
>
> I have no idea what the file "=1.3.5" is. Can you take a look at the
> contents?
>
> On Thu, Nov 9, 2017 at 12:03 AM, Romain Manni-Bucau 
> wrote:
>
>> Ok, the rat issues I got were:
>>
>> == File: /home/rmannibucau/1_dev/beam/.idea/*
>> == File: /home/rmannibucau/1_dev/beam/sdks/python/=1.3.5
>>
>> The first one could be in my default exclude - even if eclipse/idea
>> files should be in the default exclude set of beam rat config IMHO,
>> the last one is more a "?" can probably be exclude as well if created
>> by the build at some point.
>>
>>
>> Romain Manni-Bucau
>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>>
>>
>> 2017-11-08 19:17 GMT+01:00 Jean-Baptiste Onofré :
>> > Thanks for the update. I was swamped on some meetings. I'm back to test
>> the latest changes.
>> >
>> > Regards
>> > JB
>> >
>> > On Nov 8, 2017, 18:56, at 18:56, Lukasz Cwik 
>> wrote:
>> >>Thanks everyone for trying this build out in different workspaces /
>> >>configurations. This will help make sure the build works for more
>> >>people
>> >>and will get rid of any rough edges.
>> >>
>> >>Performance (All):
>> >>Maven performs parallelization at the module level, an entire module
>> >>needs
>> >>to complete before any dependent modules can start, this means running
>> >>all
>> >>the checks like findbugs, checkstyle, tests need to finish. Gradle has
>> >>task
>> >>level parallelism between subprojects which means that as soon as the
>> >>compile and shade steps are done for a project, and dependent
>> >>subprojects
>> >>can typically start. This means that we get increased parallelism due
>> >>to
>> >>not needing to wait for findbugs, checkstyle, tests to run. I typically
>> >>see
>> >>~20 tasks (at peak) running on my desktop in parallel.
>> >>
>> >>Apache Rat (JB / Romain):
>> >>What files are in the rat report that fail (its likely that I'm missing
>> >>some exclusion for a build time artifact)? Also, please try the build
>> >>again
>> >>after running `git clean -fdx` in your workspace.
>> >>
>> >>Python (JB):
>> >>As for the Python SDK, you'll need to share more details about the
>> >>failure.
>> >>
>> >>Gradle 4.3:
>> >>I would like to defer the swap to Gradle 4.3 until after this PR since
>> >>it
>> >>will be a much smaller set of changes.
>> >>
>> >>On Wed, Nov 8, 2017 at 12:54 AM, Jean-Baptiste Onofré 
>> >>wrote:
>> >>
>> >>> Same for me for rat and python build too:
>> >>>
>> >>> FAILURE: Build completed with 2 failures.
>> >>>
>> >>> 1: Task failed with an exception.
>> >>> ---
>> >>> * What went wrong:
>> >>> Execution failed for task ':rat'.
>> >>> > Found 905 files with unapproved/unknown licenses. See
>> >>> file:/home/jbonofre/Workspace/beam/build/reports/rat/rat-report.txt
>> >>>
>> >>> * Try:
>> >>> Run with --stacktrace option to get the stack trace. Run with --info
>> >>or
>> >>> --debug option to get more log output.
>> >>> 
>> >>> ==
>> >>>
>> >>> 2: Task failed with an exception.
>> >>> ---
>> >>> * Where:
>> >>> Build file '/home/jbonofre/Workspace/beam/sdks/python/build.gradle'
>> >>line:
>> >>> 64
>> >>>
>> >>> * What went wrong:
>> >>> Execution failed for task ':beam-sdks-parent:beam-sdks-python:lint'.
>> >>> > Process 'command 'tox'' finished with non-zero exit value 1
>> >>>
>> >>>
>> >>>
>> >>> On 11/08/2017 09:51 AM, Romain Manni-Bucau wrote:
>> >>>
>>  gradle branch doesnt build for me (some rat issues)
>> 
>>  Romain Manni-Bucau
>>  @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>> 
>> 
>>  2017-11-08 5:41 GMT+01:00 Jean-Baptiste Onofré :
>> 
>> > Great !
>> >
>> > What explain these difference ? I'm curious especially for the
>> >>clean
>> > build
>> > all Java modules: is it a question of parallel execution ?
>> >
>> > Regards
>> > JB
>> >
>> >
>> > On 11/08/2017 02:59 AM, Lukasz Cwik wrote:
>> >
>> >>
>> >> The Gradle POC has made significant advances since last week
>> >>(shading,
>> >> Python, Go, Docker builds, ...). I believe the current state is
>> 

Re: Jira Access

2017-11-09 Thread Lukasz Cwik
Most likely.

On Wed, Nov 8, 2017 at 8:21 PM, Paul Gerver  wrote:

> Oh, yes. When I registered pgerv12 my browser said it timed out. I tried
> to register again, said it already existed so I created pfgerver.
> Unfortunately, I didn't see a way to delete the pfgerver account.
>
> Do you know if a note to the Jira admins could handle it?
>
> On 2017-11-08 18:02, Lukasz Cwik  wrote:
> > I have added you and saw the Jira that you commented on and assigned it
> to>
> > you.>
> >
> > Curious note, I also saw a pfgerver which also seems to be you.>
> >
> > On Wed, Nov 8, 2017 at 2:08 PM, Paul Gerver  wrote:>
> >
> > > Hello,>
> > >>
> > > I'm part of the IBM Streams team and would like to contribute to the
> Apache>
> > > Beam community.>
> > > My ASF Jira ID is pgerv12.>
> > >>
> > > Thanks!>
> > >>
> > > -->
> > >>
> > > *Paul Gerver*>
> > >>
> >
>


Re: [VOTE] Release 2.2.0, release candidate #3

2017-11-09 Thread Etienne Chauchot
Just as a remark, I compared (on my laptop though) queries execution 
times on my previous run of 2.2.0-RC3 with release 2.1.0 and I did not 
see any performance regression.


Best

Etienne


Le 09/11/2017 à 03:13, Valentyn Tymofieiev a écrit :

I looked at Python side of Dataflow & Direct runners on Linux. There are
two findings:

1. One of the mobile gaming examples did not pass for Dataflow runner,
addressed in: https://github.com/apache/beam/pull/4102

.

2. Python streaming did not work for Dataflow runner, one PR is out
https://github.com/apache/beam/pull/4106, but follow up PRs may be required
as we continue to investigate. If we had a PostCommit tests suite running
against a release branch, this could have been caught earlier. Filed
https://issues.apache.org/jira/browse/BEAM-3163.

On Wed, Nov 8, 2017 at 2:39 PM, Reuven Lax  wrote:


Hi everyone,

Please review and vote on the release candidate #3 for the version 2.2.0,
as follows:
   [ ] +1, Approve the release
   [ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
   * JIRA release notes [1],
   * the official Apache source release to be deployed to dist.apache.org
[2],
which is signed with the key with fingerprint B98B7708 [3],
   * all artifacts to be deployed to the Maven Central Repository [4],
   * source code tag "v2.2.0-RC3" [5],
   * website pull request listing the release and publishing the API
reference manual [6].
   * Java artifacts were built with Maven 3.5.0 and OpenJDK/Oracle JDK
1.8.0_144.
   * Python artifacts are deployed along with the source release to the
dist.apache.org [2].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Reuven

[1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?p
rojectId=12319527=12341044
[2] https://dist.apache.org/repos/dist/dev/beam/2.2.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1023/
[5] https://github.com/apache/beam/tree/v2.2.0-RC3

[6] https://github.com/apache/beam-site/pull/337





Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Lukasz Cwik
It turns out that the Apache Rat Ant task and the Apache Rat Maven plugin
differ in that the plugin automatically excludes certain files by default
while the Ant task does not.
See:
http://creadur.apache.org/rat/apache-rat-plugin/check-mojo.html#useDefaultExcludes

I fixed the list to exclude ".idea/" instead of "idea/" since there was a
typo.

I have no idea what the file "=1.3.5" is. Can you take a look at the
contents?

On Thu, Nov 9, 2017 at 12:03 AM, Romain Manni-Bucau 
wrote:

> Ok, the rat issues I got were:
>
> == File: /home/rmannibucau/1_dev/beam/.idea/*
> == File: /home/rmannibucau/1_dev/beam/sdks/python/=1.3.5
>
> The first one could be in my default exclude - even if eclipse/idea
> files should be in the default exclude set of beam rat config IMHO,
> the last one is more a "?" can probably be exclude as well if created
> by the build at some point.
>
>
> Romain Manni-Bucau
> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>
>
> 2017-11-08 19:17 GMT+01:00 Jean-Baptiste Onofré :
> > Thanks for the update. I was swamped on some meetings. I'm back to test
> the latest changes.
> >
> > Regards
> > JB
> >
> > On Nov 8, 2017, 18:56, at 18:56, Lukasz Cwik 
> wrote:
> >>Thanks everyone for trying this build out in different workspaces /
> >>configurations. This will help make sure the build works for more
> >>people
> >>and will get rid of any rough edges.
> >>
> >>Performance (All):
> >>Maven performs parallelization at the module level, an entire module
> >>needs
> >>to complete before any dependent modules can start, this means running
> >>all
> >>the checks like findbugs, checkstyle, tests need to finish. Gradle has
> >>task
> >>level parallelism between subprojects which means that as soon as the
> >>compile and shade steps are done for a project, and dependent
> >>subprojects
> >>can typically start. This means that we get increased parallelism due
> >>to
> >>not needing to wait for findbugs, checkstyle, tests to run. I typically
> >>see
> >>~20 tasks (at peak) running on my desktop in parallel.
> >>
> >>Apache Rat (JB / Romain):
> >>What files are in the rat report that fail (its likely that I'm missing
> >>some exclusion for a build time artifact)? Also, please try the build
> >>again
> >>after running `git clean -fdx` in your workspace.
> >>
> >>Python (JB):
> >>As for the Python SDK, you'll need to share more details about the
> >>failure.
> >>
> >>Gradle 4.3:
> >>I would like to defer the swap to Gradle 4.3 until after this PR since
> >>it
> >>will be a much smaller set of changes.
> >>
> >>On Wed, Nov 8, 2017 at 12:54 AM, Jean-Baptiste Onofré 
> >>wrote:
> >>
> >>> Same for me for rat and python build too:
> >>>
> >>> FAILURE: Build completed with 2 failures.
> >>>
> >>> 1: Task failed with an exception.
> >>> ---
> >>> * What went wrong:
> >>> Execution failed for task ':rat'.
> >>> > Found 905 files with unapproved/unknown licenses. See
> >>> file:/home/jbonofre/Workspace/beam/build/reports/rat/rat-report.txt
> >>>
> >>> * Try:
> >>> Run with --stacktrace option to get the stack trace. Run with --info
> >>or
> >>> --debug option to get more log output.
> >>> 
> >>> ==
> >>>
> >>> 2: Task failed with an exception.
> >>> ---
> >>> * Where:
> >>> Build file '/home/jbonofre/Workspace/beam/sdks/python/build.gradle'
> >>line:
> >>> 64
> >>>
> >>> * What went wrong:
> >>> Execution failed for task ':beam-sdks-parent:beam-sdks-python:lint'.
> >>> > Process 'command 'tox'' finished with non-zero exit value 1
> >>>
> >>>
> >>>
> >>> On 11/08/2017 09:51 AM, Romain Manni-Bucau wrote:
> >>>
>  gradle branch doesnt build for me (some rat issues)
> 
>  Romain Manni-Bucau
>  @rmannibucau |  Blog | Old Blog | Github | LinkedIn
> 
> 
>  2017-11-08 5:41 GMT+01:00 Jean-Baptiste Onofré :
> 
> > Great !
> >
> > What explain these difference ? I'm curious especially for the
> >>clean
> > build
> > all Java modules: is it a question of parallel execution ?
> >
> > Regards
> > JB
> >
> >
> > On 11/08/2017 02:59 AM, Lukasz Cwik wrote:
> >
> >>
> >> The Gradle POC has made significant advances since last week
> >>(shading,
> >> Python, Go, Docker builds, ...). I believe the current state is
> >>close
> >> enough to the Maven build system to warrant a comparison.
> >>
> >> The largest build differences I noticed are:
> >> * Full build takes about ~22mins using Gradle (parallelizing the
> >>three
> >> rounds of Python tests would reduce this to ~17mins) compared to
> >>~38mins
> >> in
> >> Maven
> >> * Clean build all Java modules (skipping over Go/Python
> >> ) takes ~8mins in
> >> Gradle which takes ~36mins in Maven
> >> * Build output is cached 

Re: Portability overview webpage

2017-11-09 Thread Henning Rohde
Thanks Holden! Do you mean whether alternatives to gRPC/protobuf are being
discussed? If so, I'm not aware of any alternative proposals.

On Wed, Nov 8, 2017 at 9:30 PM, Holden Karau  wrote:

> Awesome! Out of interest is there any discussion around common formats for
> interchange going on?
>
> On Tue, Nov 7, 2017 at 9:15 AM, Henning Rohde 
> wrote:
>
> > Thanks everyone! The page is now live at:
> >
> >https://beam.apache.org/contribute/portability/
> >
> > Henning
> >
> > On Thu, Nov 2, 2017 at 8:22 AM, Kenneth Knowles 
> > wrote:
> >
> > > This is a superb high-level overview of the effort, understandable at a
> > > glance. I think it is the first time someone has made it clear what we
> > are
> > > actually doing!
> > >
> > > Kenn
> > >
> > > On Wed, Nov 1, 2017 at 10:23 AM, Jean-Baptiste Onofré  >
> > > wrote:
> > >
> > > > Thanks for the update. I will take a look.
> > > >
> > > > Regards
> > > > JB
> > > >
> > > > On Nov 1, 2017, 18:21, at 18:21, Henning Rohde
> > > 
> > > > wrote:
> > > > >Hi everyone,
> > > > >
> > > > >Although portability is a large and involved effort, it seems it
> > > > >doesn't
> > > > >have a high-level overview and plan written down anywhere. I added a
> > > > >proposed page with a 10,000 ft view and links to the webside under
> > > > >'Contribute (technical references)'. There is a page for ongoing
> > > > >projects,
> > > > >but portability is much more encompassing and seems to be more
> suited
> > > > >for
> > > > >it's own page.
> > > > >
> > > > >The PR is:
> > > > >
> > > > > https://github.com/apache/beam-site/pull/340
> > > > >
> > > > >I'm sending it out to the dev list for more visibility. Please let
> me
> > > > >know
> > > > >if you have any comments or objections, or if there is a better
> place
> > > > >for
> > > > >this content.
> > > > >
> > > > >Thanks,
> > > > > Henning
> > > >
> > >
> >
>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
>


Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

2017-11-09 Thread Amit Sela
+1 for dropping Spark 1 support.
I don't think we have enough users to justify supporting both, and its been
a long time since this idea originally came-up (when Spark2 wasn't stable)
and now Spark 2 is standard in all Hadoop distros.
As for switching to the Dataframe API, as long as Spark 2 doesn't support
scanning through the state periodically (even if no data for a key),
watermarks won't fire keys that didn't see updates.

On Thu, Nov 9, 2017 at 9:12 AM Thomas Weise  wrote:

> +1 (non-binding) for dropping 1.x support
>
> I don't have the impression that there is significant adoption for Beam on
> Spark 1.x ? A stronger Spark runner that works well on 2.x will be better
> for Beam adoption than a runner that has to compromise due to 1.x baggage.
> Development efforts can go into improving the runner.
>
> Thanks,
> Thomas
>
>
> On Thu, Nov 9, 2017 at 4:08 AM, Srinivas Reddy  >
> wrote:
>
> > +1
> >
> >
> >
> > --
> > Srinivas Reddy
> >
> > http://mrsrinivas.com/
> >
> >
> > (Sent via gmail web)
> >
> > On 8 November 2017 at 14:27, Jean-Baptiste Onofré 
> wrote:
> >
> > > Hi all,
> > >
> > > as you might know, we are working on Spark 2.x support in the Spark
> > runner.
> > >
> > > I'm working on a PR about that:
> > >
> > > https://github.com/apache/beam/pull/3808
> > >
> > > Today, we have something working with both Spark 1.x and 2.x from a
> code
> > > standpoint, but I have to deal with dependencies. It's the first step
> of
> > > the update as I'm still using RDD, the second step would be to support
> > > dataframe (but for that, I would need PCollection elements with
> schemas,
> > > that's another topic on which Eugene, Reuven and I are discussing).
> > >
> > > However, as all major distributions now ship Spark 2.x, I don't think
> > it's
> > > required anymore to support Spark 1.x.
> > >
> > > If we agree, I will update and cleanup the PR to only support and focus
> > on
> > > Spark 2.x.
> > >
> > > So, that's why I'm calling for a vote:
> > >
> > >   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
> > >   [ ] 0 (I don't care ;))
> > >   [ ] -1, I would like to still support Spark 1.x, and so having
> support
> > > of both Spark 1.x and 2.x (please provide specific comment)
> > >
> > > This vote is open for 48 hours (I have the commits ready, just waiting
> > the
> > > end of the vote to push on the PR).
> > >
> > > Thanks !
> > > Regards
> > > JB
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>


Re: [VOTE] Drop Spark 1.x support to focus on Spark 2.x

2017-11-09 Thread Ismaël Mejía
+1 for the move to Spark 2 modulo preventing users and deciding on support:

I agree that having compatibility for both versions of Spark is
desirable but I am not sure if is worth the effort. Apart of the
reasons mentioned by Holden and Pei, I will add that the burden of
simultaneous maintenance could be bigger than the return, and also
that most Big Data/Cloud distributions have moved already to Spark 2,
so it makes sense to prioritize the new users better than the legacy
ones, in particular if we consider that Beam is a ‘recent’ project.

We can announce the end of the support for Spark 1 in the release
notes of Beam 2.2 and decide if we will support it in maintenance
mode, in this case we will backport or fix any reported issue related
to the Spark 1 runner on the 2.2.x branch let’s say for a year, but we
won’t add new functionalities. Or we can just decide not to support it
anymore and encourage users to move to Spark 2.

On Thu, Nov 9, 2017 at 6:59 AM, Pei HE  wrote:
> +1 on moving forward with Spark 2.x only.
> Spark 1 users can still use already released Spark runners, and we can
> support them with minor version releases for future bug fixes.
>
> I don't see how important it is to make future Beam releases available to
> Spark 1 users. If they choose not to upgrade Spark clusters, maybe they
> don't need the newest Beam releases as well.
>
> I think it is more important to 1). be able to leverage new features in
> Spark 2.x, 2.) extend user base to Spark 2.
> --
> Pei
>
>
> On Thu, Nov 9, 2017 at 1:45 PM, Holden Karau  wrote:
>
>> That's a good point about Oozie does only supporting only Spark 1 or 2 at a
>> time on a cluster -- but do we know people using Oozie and Spark 1 that
>> would still be using Spark 1 by the time of the next BEAM release? The last
>> Spark 1 release was a year ago (and last non-maintenance release almost 20
>> months ago).
>>
>> On Wed, Nov 8, 2017 at 9:30 PM, NerdyNick  wrote:
>>
>> > I don't know if ditching Spark 1 out right right now would be a great
>> move
>> > given that a lot of the main support applications around spark haven't
>> yet
>> > fully moved to Spark 2 yet. Yet alone have support for having a cluster
>> > with both. Oozie for example is still pre stable release for their Spark
>> 1
>> > and can't support a cluster with mixed Spark version. I think maybe doing
>> > as suggested above with the common, spark1, spark2 packaging might be
>> best
>> > during this carry over phase. Maybe even just flag spark 1 as deprecated
>> > and just being maintained might be enough.
>> >
>> > On Wed, Nov 8, 2017 at 10:25 PM, Holden Karau 
>> > wrote:
>> >
>> > > Also, upgrading Spark 1 to 2 is generally easier than changing JVM
>> > > versions. For folks using YARN or the hosted environments it pretty
>> much
>> > > trivial since you can effectively have distinct Spark clusters for each
>> > > job.
>> > >
>> > > On Wed, Nov 8, 2017 at 9:19 PM, Holden Karau 
>> > wrote:
>> > >
>> > > > I'm +1 on dropping Spark 1. There are a lot of exciting improvements
>> in
>> > > > Spark 2, and trying to write efficient code that runs between Spark 1
>> > and
>> > > > Spark 2 is super painful in the long term. It would be one thing if
>> > there
>> > > > were a lot of people available to work on the Spark runners, but it
>> > seems
>> > > > like we'd be better spent focusing our energy on the future.
>> > > >
>> > > > I don't know a lot of folks who are stuck on Spark 1, and the few
>> that
>> > I
>> > > > know are planning to migrate in the next few months anyways.
>> > > >
>> > > > Note: this is a non-binding vote as I'm not a committer or PMC
>> member.
>> > > >
>> > > > On Wed, Nov 8, 2017 at 3:43 AM, Ted Yu  wrote:
>> > > >
>> > > >> Having both Spark1 and Spark2 modules would benefit wider user base.
>> > > >>
>> > > >> I would vote for that.
>> > > >>
>> > > >> Cheers
>> > > >>
>> > > >> On Wed, Nov 8, 2017 at 12:51 AM, Jean-Baptiste Onofré <
>> > j...@nanthrax.net>
>> > > >> wrote:
>> > > >>
>> > > >> > Hi Robert,
>> > > >> >
>> > > >> > Thanks for your feedback !
>> > > >> >
>> > > >> > From an user perspective, with the current state of the PR, the
>> same
>> > > >> > pipelines can run on both Spark 1.x and 2.x: the only difference
>> is
>> > > the
>> > > >> > dependencies set.
>> > > >> >
>> > > >> > I'm calling the vote to get suck kind of feedback: if we consider
>> > > Spark
>> > > >> > 1.x still need to be supported, no problem, I will improve the PR
>> to
>> > > >> have
>> > > >> > three modules (common, spark1, spark2) and let users pick the
>> > desired
>> > > >> > version.
>> > > >> >
>> > > >> > Let's wait a bit other feedbacks, I will update the PR
>> accordingly.
>> > > >> >
>> > > >> > Regards
>> > > >> > JB
>> > > >> >
>> > > >> >
>> > > >> > On 11/08/2017 09:47 AM, Robert Bradshaw wrote:
>> > > >> >
>> > > >> >> I'm generally a -0.5 on 

Jenkins build became unstable: beam_Release_NightlySnapshot #588

2017-11-09 Thread Apache Jenkins Server
See 




Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-09 Thread Romain Manni-Bucau
Ok, the rat issues I got were:

== File: /home/rmannibucau/1_dev/beam/.idea/*
== File: /home/rmannibucau/1_dev/beam/sdks/python/=1.3.5

The first one could be in my default exclude - even if eclipse/idea
files should be in the default exclude set of beam rat config IMHO,
the last one is more a "?" can probably be exclude as well if created
by the build at some point.


Romain Manni-Bucau
@rmannibucau |  Blog | Old Blog | Github | LinkedIn


2017-11-08 19:17 GMT+01:00 Jean-Baptiste Onofré :
> Thanks for the update. I was swamped on some meetings. I'm back to test the 
> latest changes.
>
> Regards
> JB
>
> On Nov 8, 2017, 18:56, at 18:56, Lukasz Cwik  wrote:
>>Thanks everyone for trying this build out in different workspaces /
>>configurations. This will help make sure the build works for more
>>people
>>and will get rid of any rough edges.
>>
>>Performance (All):
>>Maven performs parallelization at the module level, an entire module
>>needs
>>to complete before any dependent modules can start, this means running
>>all
>>the checks like findbugs, checkstyle, tests need to finish. Gradle has
>>task
>>level parallelism between subprojects which means that as soon as the
>>compile and shade steps are done for a project, and dependent
>>subprojects
>>can typically start. This means that we get increased parallelism due
>>to
>>not needing to wait for findbugs, checkstyle, tests to run. I typically
>>see
>>~20 tasks (at peak) running on my desktop in parallel.
>>
>>Apache Rat (JB / Romain):
>>What files are in the rat report that fail (its likely that I'm missing
>>some exclusion for a build time artifact)? Also, please try the build
>>again
>>after running `git clean -fdx` in your workspace.
>>
>>Python (JB):
>>As for the Python SDK, you'll need to share more details about the
>>failure.
>>
>>Gradle 4.3:
>>I would like to defer the swap to Gradle 4.3 until after this PR since
>>it
>>will be a much smaller set of changes.
>>
>>On Wed, Nov 8, 2017 at 12:54 AM, Jean-Baptiste Onofré 
>>wrote:
>>
>>> Same for me for rat and python build too:
>>>
>>> FAILURE: Build completed with 2 failures.
>>>
>>> 1: Task failed with an exception.
>>> ---
>>> * What went wrong:
>>> Execution failed for task ':rat'.
>>> > Found 905 files with unapproved/unknown licenses. See
>>> file:/home/jbonofre/Workspace/beam/build/reports/rat/rat-report.txt
>>>
>>> * Try:
>>> Run with --stacktrace option to get the stack trace. Run with --info
>>or
>>> --debug option to get more log output.
>>> 
>>> ==
>>>
>>> 2: Task failed with an exception.
>>> ---
>>> * Where:
>>> Build file '/home/jbonofre/Workspace/beam/sdks/python/build.gradle'
>>line:
>>> 64
>>>
>>> * What went wrong:
>>> Execution failed for task ':beam-sdks-parent:beam-sdks-python:lint'.
>>> > Process 'command 'tox'' finished with non-zero exit value 1
>>>
>>>
>>>
>>> On 11/08/2017 09:51 AM, Romain Manni-Bucau wrote:
>>>
 gradle branch doesnt build for me (some rat issues)

 Romain Manni-Bucau
 @rmannibucau |  Blog | Old Blog | Github | LinkedIn


 2017-11-08 5:41 GMT+01:00 Jean-Baptiste Onofré :

> Great !
>
> What explain these difference ? I'm curious especially for the
>>clean
> build
> all Java modules: is it a question of parallel execution ?
>
> Regards
> JB
>
>
> On 11/08/2017 02:59 AM, Lukasz Cwik wrote:
>
>>
>> The Gradle POC has made significant advances since last week
>>(shading,
>> Python, Go, Docker builds, ...). I believe the current state is
>>close
>> enough to the Maven build system to warrant a comparison.
>>
>> The largest build differences I noticed are:
>> * Full build takes about ~22mins using Gradle (parallelizing the
>>three
>> rounds of Python tests would reduce this to ~17mins) compared to
>>~38mins
>> in
>> Maven
>> * Clean build all Java modules (skipping over Go/Python
>> ) takes ~8mins in
>> Gradle which takes ~36mins in Maven
>> * Build output is cached allowing for faster subsequent builds
>>with
>> "gradle
>> buildDependents" allowing for most single module changes taking
>>~2mins
>> to
>> build and test without needing to rely on "mvn install"
>>
>> I have opened PR 4096 
>>so
>> that
>> the Gradle build files merged and then follow up with new Jenkins
>> precommits which are powered by Gradle. This will allow the
>>community to
>> continuing contributing to the Gradle build and also allow for a
>> comparison
>> of the precommit times on the Jenkins executor when using
>>Maven/Gradle.
>> I
>> suggest that those who are interested try out the PR.
>>
>> On Fri, Nov 3, 2017 at 10:29 PM, Jean-Baptiste Onofré