Apache Beam Workshop in Guadalajara Mexico

2017-11-28 Thread Griselda Cuevas
Hi Everyone,

I wanted to share with you that on December 2nd, Wizeline Academy [1] will
host an Apache Beam workshop in Guadalajara Mexico. The objective of this
workshop is to identify adoption barriers and improvement opportunities for
the project through the observation and documentation of the experience of
new Beam users. We hope that the findings of this workshop can provide
supportive information to shape the direction of our project, specially now
that we have started conversation about next releases.

If you are in the area and are interested in joining us, please sign up
[2]. If you're interested in running similar efforts reach out to me and
I'll be happy to share resources and connect with you. I'll report back
with findings after the workshop.

Cheers,
Gris

[1] https://academy.wizeline.com/about/
[2] https://academy.wizeline.com/apache-beam/


Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-28 Thread Romain Manni-Bucau
Ps: forgot another wish: make usable beam sql. Today you need to add a fn
before and after cause of that type breakage not consistent with the
pipeline API. It would be nice to support pojo (extracted from the select
fields or created from "views" like in jackson) bit not having to wrap the
sql usage in multiple UDF would make it powerful and ready to use.

Le 29 nov. 2017 07:01, "Romain Manni-Bucau"  a
écrit :

> My user wishes - whatever version, it is just a number after all ;):
>
> - make coder usage simpler and consistent (PCollection TypeDescriptor and
> Coder are duplicated in term of API)
> - have a beam api (split from the sdk and internals and impl)
> - have SDF supported by runners
> - have a SDFRunner allowing to simulate the SDF lifecycle manually (same
> for DoFn short term - see next point for the current issue)
> - ensure classloader usage is consistent, ie any proxy is created into the
> final artifact classloader (transform if custom, dofn/source/sdf otherwise)
> - have a test compatibility kit (TCK) for runner. It would be a jar any
> runner impl can import to run with surefire
> - make IO configuration reflection friendly (get rid of the autovalue
> pattern which is not industriablizable and allow pojo like classes or
> alternatively support reading the conf from properties)
> - support pipeline implicit option based on transform names to override
> some attributes
> - change runner implementations to let the bundle size have a pipeline
> option defining an upper bound and not hardcode them arbitrarly - defaults
> can stay the current ones
> - better multi input/output support (just PCollection based and fully
> wireable)
> - a smoother pipeline API would be nice. I like hazelcast jet one for
> instance
>
> Le 29 nov. 2017 03:29, "Robert Bradshaw"  a écrit :
>
>> On Tue, Nov 28, 2017 at 9:48 AM, Reuven Lax  wrote:
>> >
>> > On Tue, Nov 28, 2017 at 9:14 AM, Jean-Baptiste Onofré 
>> > wrote:
>> >>
>> >> Hi Reuven,
>> >>
>> >> Yes, I remember that we agreed on a release per month. However, we
>> didn't
>> >> do it before. I think the most important is not the period, it's more a
>> >> stable pace. I think it's more interesting for our community to have
>> >> "always" a release every two months, more than a tentative of a release
>> >> every month that end later than that. Of course, if we can do both,
>> it's
>> >> perfect ;)
>> >
>> > Agree. A stable pace is the most important thing.
>>
>> +1, and I think everyone who's done a release is in favor of making it
>> easier and more frequent. Someone should put together a proposal of
>> easy things we can do to automate, etc.
>>
>> >> For Beam 3.x, I wasn't talking about breaking change, but more about
>> >> "marketing" announcement. I think that, even if we don't break API,
>> some
>> >> features are "strong enough" to be "qualified" in a major version.
>> >
>> > Ah, good point. This doesn't stop us from checking in these new features
>> > into 2.x possibly tagged with an @Experimental flag. We can then use
>> 3.0 to
>> > announce all these features more broadly, and remove @Experimental tags.
>> >
>> > I would also like to see enterprise-ready BeamSQL and Java 7
>> deprecation on
>> > the list for Beam 3.0
>> >
>> >>
>> >> I think that any major idea & feature (breaking or not the API) are
>> >> valuables for Beam 3.x (and it's a good sign for our community again
>> ;)).
>>
>> I'm generally not a fan of bumping the major version number just
>> because enough time has passed, or enough new features have gone in
>> (and am mostly opposed to holding features back just because we want
>> to announce them (simultanously?) in a big release)--instead I find
>> that the need for a new major version arises out of a realization that
>> the model has sufficiently changed and we need to cut ties with the
>> old way of doing things (that's perhaps holding us back). That being
>> said, it could be that some of these features are large enough to
>> merit this.
>>
>> Regardless of the naming, I think it's a great time to have a
>> discussion of where we want to go in 2018.
>>
>> Top of my list is first class support for Schema'd PCollections (and
>> with it SQL support, etc.) and full support of the portability
>> framework realizing the possibility of every runner running every SDK
>> (and, ideally, even cross-SDK/language pipelines). I would also like
>> to see explorations into interactive/incremental (for Python at least,
>> but probably Java as well).
>>
>> - Robert
>>
>>
>> >> On 11/28/2017 06:09 PM, Reuven Lax wrote:
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré <
>> j...@nanthrax.net
>> >>> > wrote:
>> >>>
>> >>> Hi guys,
>> >>>
>> >>> Even if there's no rush, I think it would be great for the
>> community
>> >>> to have
>> >>> a better view on our roadmap and where we are going in term of
>> >>> 

Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-28 Thread Romain Manni-Bucau
My user wishes - whatever version, it is just a number after all ;):

- make coder usage simpler and consistent (PCollection TypeDescriptor and
Coder are duplicated in term of API)
- have a beam api (split from the sdk and internals and impl)
- have SDF supported by runners
- have a SDFRunner allowing to simulate the SDF lifecycle manually (same
for DoFn short term - see next point for the current issue)
- ensure classloader usage is consistent, ie any proxy is created into the
final artifact classloader (transform if custom, dofn/source/sdf otherwise)
- have a test compatibility kit (TCK) for runner. It would be a jar any
runner impl can import to run with surefire
- make IO configuration reflection friendly (get rid of the autovalue
pattern which is not industriablizable and allow pojo like classes or
alternatively support reading the conf from properties)
- support pipeline implicit option based on transform names to override
some attributes
- change runner implementations to let the bundle size have a pipeline
option defining an upper bound and not hardcode them arbitrarly - defaults
can stay the current ones
- better multi input/output support (just PCollection based and fully
wireable)
- a smoother pipeline API would be nice. I like hazelcast jet one for
instance

Le 29 nov. 2017 03:29, "Robert Bradshaw"  a écrit :

> On Tue, Nov 28, 2017 at 9:48 AM, Reuven Lax  wrote:
> >
> > On Tue, Nov 28, 2017 at 9:14 AM, Jean-Baptiste Onofré 
> > wrote:
> >>
> >> Hi Reuven,
> >>
> >> Yes, I remember that we agreed on a release per month. However, we
> didn't
> >> do it before. I think the most important is not the period, it's more a
> >> stable pace. I think it's more interesting for our community to have
> >> "always" a release every two months, more than a tentative of a release
> >> every month that end later than that. Of course, if we can do both, it's
> >> perfect ;)
> >
> > Agree. A stable pace is the most important thing.
>
> +1, and I think everyone who's done a release is in favor of making it
> easier and more frequent. Someone should put together a proposal of
> easy things we can do to automate, etc.
>
> >> For Beam 3.x, I wasn't talking about breaking change, but more about
> >> "marketing" announcement. I think that, even if we don't break API, some
> >> features are "strong enough" to be "qualified" in a major version.
> >
> > Ah, good point. This doesn't stop us from checking in these new features
> > into 2.x possibly tagged with an @Experimental flag. We can then use 3.0
> to
> > announce all these features more broadly, and remove @Experimental tags.
> >
> > I would also like to see enterprise-ready BeamSQL and Java 7 deprecation
> on
> > the list for Beam 3.0
> >
> >>
> >> I think that any major idea & feature (breaking or not the API) are
> >> valuables for Beam 3.x (and it's a good sign for our community again
> ;)).
>
> I'm generally not a fan of bumping the major version number just
> because enough time has passed, or enough new features have gone in
> (and am mostly opposed to holding features back just because we want
> to announce them (simultanously?) in a big release)--instead I find
> that the need for a new major version arises out of a realization that
> the model has sufficiently changed and we need to cut ties with the
> old way of doing things (that's perhaps holding us back). That being
> said, it could be that some of these features are large enough to
> merit this.
>
> Regardless of the naming, I think it's a great time to have a
> discussion of where we want to go in 2018.
>
> Top of my list is first class support for Schema'd PCollections (and
> with it SQL support, etc.) and full support of the portability
> framework realizing the possibility of every runner running every SDK
> (and, ideally, even cross-SDK/language pipelines). I would also like
> to see explorations into interactive/incremental (for Python at least,
> but probably Java as well).
>
> - Robert
>
>
> >> On 11/28/2017 06:09 PM, Reuven Lax wrote:
> >>>
> >>>
> >>>
> >>> On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré  >>> > wrote:
> >>>
> >>> Hi guys,
> >>>
> >>> Even if there's no rush, I think it would be great for the
> community
> >>> to have
> >>> a better view on our roadmap and where we are going in term of
> >>> schedule.
> >>>
> >>> I would like to discuss the following:
> >>> - a best effort to maintain a good release pace or at least
> provide a
> >>> rough
> >>> schedule. For instance, in Apache Karaf, I have a release schedule
> >>> (http://karaf.apache.org/download.html#container-schedule
> >>> ). I
> think
> >>> a
> >>> release ~ every quarter would be great.
> >>>
> >>>
> >>> Originally we had stated that we wanted monthly releases of Beam. So
> far
> >>> the releases have been painful 

Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread Kenneth Knowles
Let's assume that when I say (a) the author has arranged commits to be
meaningful. That's what I meant to say in each of my descriptions of the
option. If they are noise, it doesn't apply.

On Tue, Nov 28, 2017 at 8:04 PM, James  wrote:

> Thanks Kenn for bring up this expanded discussion, my vote is:
>
> (a) -1 this preserves noise log like 'fix review comments'
> (b) +0 this keeps the commit log clean, but without a rebase
> (c) -1 similar to option a), it preserves noise log like 'fix review
> comments'
>
> My ideal option is the current manual merge process: `rebase + squash`,
> maybe we should consider introducing mergebot?
>
>
> On Wed, Nov 29, 2017 at 4:01 AM Raghu Angadi  wrote:
>
>> On Tue, Nov 28, 2017 at 11:47 AM, Thomas Weise  wrote:
>>
>>>
>>> (a) -0 due to extra noise in the commit log
>>>
>>
>>
>>> (b) -1 (as standard/default) this should be done by contributor as there
>>> may be few situation where individual commits should be preserved
>>>
>>
>> It is better to preserve the commit history of the PR at least in the
>> committer branch (and PR).
>> In addition having to force push squashed commit to remote git branch
>> each time is quite painful. If we squash at all, final merge into master
>> seems like the best place.
>>
>>
>>> (c) +1 the rebase will also record the committer (which would be merge
>>> commit author otherwise)
>>>
>>> In general the process should result in "merged" status for a merged PR
>>> as opposed to "closed" as seen often currently.
>>>
>>> Thanks,
>>> Thomas
>>>
>>>
>>>
>>> On Tue, Nov 28, 2017 at 11:23 AM, Kenneth Knowles 
>>> wrote:
>>>
 On Tue, Nov 28, 2017 at 11:16 AM, Raghu Angadi 
 wrote:

> -1 for (a): no need to see all the private branch commits from
> contributor. It often makes me more conscious of local commits.
>

 I want to note that on my PRs these are not private commits. Each one
 is a meaningful isolated change that can be rolled back and is useful to
 keep separate when looking at `git blame` or the history of a file. I would
 encourage every contributor to also do this. A PR is the unit of code
 review, but the unit of meaningful change to a repository is often much
 smaller.

 Kenn


> +1 for (b): with committer replacing the squashed commit messages with
> '[BEAM-jira or PRID]: Brief cut-n-paste (or longer if it contributor
> provided one)'.
> -1 for (c): This is quite painful for contributors to work with if
> there has been merge conflict with master. Especially for larger PRs with
> multiple updates.
>
> On Tue, Nov 28, 2017 at 10:24 AM, Lukasz Cwik 
> wrote:
>
>> Is it possible for mergebot to auto squash any fixup! and perform the
>> merge commit as described in (a), if so then I would vote for mergebot.
>>
>> Without mergebot, I vote:
>> (a) 0 I like squashing fixup!
>> (b) -1
>> (c) +1 Most of our PRs are for focused singular changes which is why
>> I would rather squash everything over not squashing anything
>>
>>
>>
>> On Tue, Nov 28, 2017 at 9:57 AM, Kenneth Knowles 
>> wrote:
>>
>>> On Tue, Nov 28, 2017 at 9:51 AM, Ben Chambers 
>>> wrote:
>>>
 One risk to "squash and merge" is that it may lead to commits that
 don't have clean descriptions -- for instance, commits like "Fixing 
 review
 comments" will show up. If we use (a) these would also show up as 
 separate
 commits. It seems like there are two cases of multiple commits in a PR:

 1. Multiple commits in a PR that have semantic meaning (eg., a PR
 performed N steps, split across N commits). In this case, keeping the
 descriptions and performing either a merge (if the commits are 
 separately
 valid) or squash (if we want the commits to become a single commit in
 master) probably makes sense.

>>>
>>> Keep 'em
>>>
>>>
 2. Multiple commits in a PR that just reflect the review history.
 In this case, we should probably ask the PR author to explicitly rebase
 their PR to have semantically meaningful commits prior to merging. 
 (Eg., do
 a rebase -i).

>>>
>>> Ask the author to squash 'em.
>>>
>>> Kenn
>>>
>>>

 On Tue, Nov 28, 2017 at 9:46 AM Kenneth Knowles 
 wrote:

> Hi all,
>
> James brought up a great question in Slack, which was how should
> we use the merge button, illustrated [1]
>
> I want to broaden the discussion to talk about all the new
> capabilities:
>
> 1. Whether & how to use the "reviewer" field
> 2. Whether & how 

Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread James
Thanks Kenn for bring up this expanded discussion, my vote is:

(a) -1 this preserves noise log like 'fix review comments'
(b) +0 this keeps the commit log clean, but without a rebase
(c) -1 similar to option a), it preserves noise log like 'fix review
comments'

My ideal option is the current manual merge process: `rebase + squash`,
maybe we should consider introducing mergebot?


On Wed, Nov 29, 2017 at 4:01 AM Raghu Angadi  wrote:

> On Tue, Nov 28, 2017 at 11:47 AM, Thomas Weise  wrote:
>
>>
>> (a) -0 due to extra noise in the commit log
>>
>
>
>> (b) -1 (as standard/default) this should be done by contributor as there
>> may be few situation where individual commits should be preserved
>>
>
> It is better to preserve the commit history of the PR at least in the
> committer branch (and PR).
> In addition having to force push squashed commit to remote git branch each
> time is quite painful. If we squash at all, final merge into master seems
> like the best place.
>
>
>> (c) +1 the rebase will also record the committer (which would be merge
>> commit author otherwise)
>>
>> In general the process should result in "merged" status for a merged PR
>> as opposed to "closed" as seen often currently.
>>
>> Thanks,
>> Thomas
>>
>>
>>
>> On Tue, Nov 28, 2017 at 11:23 AM, Kenneth Knowles  wrote:
>>
>>> On Tue, Nov 28, 2017 at 11:16 AM, Raghu Angadi 
>>> wrote:
>>>
 -1 for (a): no need to see all the private branch commits from
 contributor. It often makes me more conscious of local commits.

>>>
>>> I want to note that on my PRs these are not private commits. Each one is
>>> a meaningful isolated change that can be rolled back and is useful to keep
>>> separate when looking at `git blame` or the history of a file. I would
>>> encourage every contributor to also do this. A PR is the unit of code
>>> review, but the unit of meaningful change to a repository is often much
>>> smaller.
>>>
>>> Kenn
>>>
>>>
 +1 for (b): with committer replacing the squashed commit messages with
 '[BEAM-jira or PRID]: Brief cut-n-paste (or longer if it contributor
 provided one)'.
 -1 for (c): This is quite painful for contributors to work with if
 there has been merge conflict with master. Especially for larger PRs with
 multiple updates.

 On Tue, Nov 28, 2017 at 10:24 AM, Lukasz Cwik  wrote:

> Is it possible for mergebot to auto squash any fixup! and perform the
> merge commit as described in (a), if so then I would vote for mergebot.
>
> Without mergebot, I vote:
> (a) 0 I like squashing fixup!
> (b) -1
> (c) +1 Most of our PRs are for focused singular changes which is why I
> would rather squash everything over not squashing anything
>
>
>
> On Tue, Nov 28, 2017 at 9:57 AM, Kenneth Knowles 
> wrote:
>
>> On Tue, Nov 28, 2017 at 9:51 AM, Ben Chambers 
>> wrote:
>>
>>> One risk to "squash and merge" is that it may lead to commits that
>>> don't have clean descriptions -- for instance, commits like "Fixing 
>>> review
>>> comments" will show up. If we use (a) these would also show up as 
>>> separate
>>> commits. It seems like there are two cases of multiple commits in a PR:
>>>
>>> 1. Multiple commits in a PR that have semantic meaning (eg., a PR
>>> performed N steps, split across N commits). In this case, keeping the
>>> descriptions and performing either a merge (if the commits are 
>>> separately
>>> valid) or squash (if we want the commits to become a single commit in
>>> master) probably makes sense.
>>>
>>
>> Keep 'em
>>
>>
>>> 2. Multiple commits in a PR that just reflect the review history. In
>>> this case, we should probably ask the PR author to explicitly rebase 
>>> their
>>> PR to have semantically meaningful commits prior to merging. (Eg., do a
>>> rebase -i).
>>>
>>
>> Ask the author to squash 'em.
>>
>> Kenn
>>
>>
>>>
>>> On Tue, Nov 28, 2017 at 9:46 AM Kenneth Knowles 
>>> wrote:
>>>
 Hi all,

 James brought up a great question in Slack, which was how should we
 use the merge button, illustrated [1]

 I want to broaden the discussion to talk about all the new
 capabilities:

 1. Whether & how to use the "reviewer" field
 2. Whether & how to use the "assignee" field
 3. Whether & how to use the merge button

 My preferences are:

 1. Use the reviewer field instead of "R:" comments.
 2. Use the assignee field to keep track of who the review is
 blocked on (either the reviewer for more comments or the author for 
 fixes)
 3. Use merge commits, but editing the commit subject 

Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-28 Thread Robert Bradshaw
On Tue, Nov 28, 2017 at 9:48 AM, Reuven Lax  wrote:
>
> On Tue, Nov 28, 2017 at 9:14 AM, Jean-Baptiste Onofré 
> wrote:
>>
>> Hi Reuven,
>>
>> Yes, I remember that we agreed on a release per month. However, we didn't
>> do it before. I think the most important is not the period, it's more a
>> stable pace. I think it's more interesting for our community to have
>> "always" a release every two months, more than a tentative of a release
>> every month that end later than that. Of course, if we can do both, it's
>> perfect ;)
>
> Agree. A stable pace is the most important thing.

+1, and I think everyone who's done a release is in favor of making it
easier and more frequent. Someone should put together a proposal of
easy things we can do to automate, etc.

>> For Beam 3.x, I wasn't talking about breaking change, but more about
>> "marketing" announcement. I think that, even if we don't break API, some
>> features are "strong enough" to be "qualified" in a major version.
>
> Ah, good point. This doesn't stop us from checking in these new features
> into 2.x possibly tagged with an @Experimental flag. We can then use 3.0 to
> announce all these features more broadly, and remove @Experimental tags.
>
> I would also like to see enterprise-ready BeamSQL and Java 7 deprecation on
> the list for Beam 3.0
>
>>
>> I think that any major idea & feature (breaking or not the API) are
>> valuables for Beam 3.x (and it's a good sign for our community again ;)).

I'm generally not a fan of bumping the major version number just
because enough time has passed, or enough new features have gone in
(and am mostly opposed to holding features back just because we want
to announce them (simultanously?) in a big release)--instead I find
that the need for a new major version arises out of a realization that
the model has sufficiently changed and we need to cut ties with the
old way of doing things (that's perhaps holding us back). That being
said, it could be that some of these features are large enough to
merit this.

Regardless of the naming, I think it's a great time to have a
discussion of where we want to go in 2018.

Top of my list is first class support for Schema'd PCollections (and
with it SQL support, etc.) and full support of the portability
framework realizing the possibility of every runner running every SDK
(and, ideally, even cross-SDK/language pipelines). I would also like
to see explorations into interactive/incremental (for Python at least,
but probably Java as well).

- Robert


>> On 11/28/2017 06:09 PM, Reuven Lax wrote:
>>>
>>>
>>>
>>> On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré >> > wrote:
>>>
>>> Hi guys,
>>>
>>> Even if there's no rush, I think it would be great for the community
>>> to have
>>> a better view on our roadmap and where we are going in term of
>>> schedule.
>>>
>>> I would like to discuss the following:
>>> - a best effort to maintain a good release pace or at least provide a
>>> rough
>>> schedule. For instance, in Apache Karaf, I have a release schedule
>>> (http://karaf.apache.org/download.html#container-schedule
>>> ). I think
>>> a
>>> release ~ every quarter would be great.
>>>
>>>
>>> Originally we had stated that we wanted monthly releases of Beam. So far
>>> the releases have been painful enough that monthly hasn't happened. I think
>>> we should address these issues and go to monthly releases as originally
>>> stated.
>>>
>>> - if I see new Beam 2.x releases for sure (according to the previous
>>> point),
>>> it would be great to have discussion about Beam 3.x. I think that one
>>> of
>>> interesting new feature that Beam 3.x can provide is around
>>> PCollection with
>>> Schemas. It's something that we started to discuss with Reuven and
>>> Eugene.
>>> In term of schedule,
>>>
>>>
>>> I don't think schemas require Beam 3.0 - I think we can introduce them
>>> without making breaking changes. However there are many other features that
>>> would be very interesting for Beam 3.x, and we should start putting together
>>> a list of them. I
>>>
>>>
>>> I would love to see your thoughts & ideas about releases schedule and
>>> Beam 3.x.
>>>
>>> Regards
>>> JB
>>> -- Jean-Baptiste Onofré
>>> jbono...@apache.org 
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>
>


Re: Azure(ADLS) compatibility on Beam with Spark runner

2017-11-28 Thread Udi Meiri
Hi JB,
I'm working on adding HDFS support to the Python runner.
We're planning on using libhdfs3, which doesn't seem to support anything
other than HDFS.


On Mon, Nov 27, 2017 at 12:44 PM Lukasz Cwik 
wrote:

> Out of curiosity, does using the DirectRunner with ADL work for you?
> If not, then you'll be able to debug locally why its failing.
>
> On Fri, Nov 24, 2017 at 8:09 PM, Milan Chandna <
> milan.chan...@microsoft.com.invalid> wrote:
>
> > Hi JB,
> >
> > Thanks for the updates.
> > BTW I am myself in Microsoft but I am trying this out of my interest.
> > And it's good to know that someone else is also working on this.
> >
> > -Milan.
> >
> > -Original Message-
> > From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net]
> > Sent: Thursday, November 23, 2017 1:47 PM
> > To: dev@beam.apache.org
> > Subject: Re: Azure(ADLS) compatibility on Beam with Spark runner
> >
> > The Azure guys tried to use ADLS via Beam HDFS filesystem, but it seems
> > they didn't succeed.
> > The new approach we plan is to directly use the ADLS API.
> >
> > I keep you posted.
> >
> > Regards
> > JB
> >
> > On 11/23/2017 07:42 AM, Milan Chandna wrote:
> > > I tried both the ways.
> > > Passed ADL specific configuration in --hdfsConfiguration as well and
> > have setup the core-site.xml/hdfs-site.xml as well.
> > > As I mentioned it's a HDI + Spark cluster, those things are already
> > setup.
> > > Spark job(without Beam) is also able to read and write to ADLS on same
> > machine.
> > >
> > > BTW if the authentication or understanding ADL was a problem, it would
> > have thrown error like ADLFileSystem missing or probably access failed or
> > something. Thoughts?
> > >
> > > -Milan.
> > >
> > > -Original Message-
> > > From: Lukasz Cwik [mailto:lc...@google.com.INVALID]
> > > Sent: Thursday, November 23, 2017 5:05 AM
> > > To: dev@beam.apache.org
> > > Subject: Re: Azure(ADLS) compatibility on Beam with Spark runner
> > >
> > > In your example it seems as though your HDFS configuration doesn't
> > contain any ADL specific configuration:  "--hdfsConfiguration='[{\"fs.
> > defaultFS\":
> > > \"hdfs://home/sample.txt\"]'"
> > > Do you have a core-site.xml or hdfs-site.xml configured as per:
> > > https://na01.safelinks.protection.outlook.com/?url=
> > https%3A%2F%2Fhadoop.apache.org%2Fdocs%2Fcurrent%2Fhadoop-
> > azure-datalake%2Findex.html=02%7C01%7CMilan.Chandna%40microsoft.com
> %
> > 7Cb7dffcc26bfe44df589a08d53201aeab%7C72f988bf86f141af91ab2d7cd011
> > db47%7C1%7C0%7C636469905161638292=Z%2FNJPDOZf5Xn6g9mVDfYdGiQKBPLJ1
> > Gft8eka5W7Yts%3D=0?
> > >
> > >  From the documentation for --hdfsConfiguration:
> > > A list of Hadoop configurations used to configure zero or more Hadoop
> > filesystems. By default, Hadoop configuration is loaded from
> > 'core-site.xml' and 'hdfs-site.xml based upon the HADOOP_CONF_DIR and
> > YARN_CONF_DIR environment variables. To specify configuration on the
> > command-line, represent the value as a JSON list of JSON maps, where each
> > map represents the entire configuration for a single Hadoop filesystem.
> For
> > example --hdfsConfiguration='[{\"fs.default.name\":
> > > \"hdfs://localhost:9998\", ...},{\"fs.default.name\": \"s3a://\",
> > ...},...]'
> > > From:
> > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> > > b.com%2Fapache%2Fbeam%2Fblob%2F9f81fd299bd32e0d6056a7da9fa994cf74db0ed
> > > 9%2Fsdks%2Fjava%2Fio%2Fhadoop-file-system%2Fsrc%2Fmain%2Fjava%2Forg%2F
> > > apache%2Fbeam%2Fsdk%2Fio%2Fhdfs%2FHadoopFileSystemOptions.java%23L45
> > > ata=02%7C01%7CMilan.Chandna%40microsoft.com%7Cb7dffcc26bfe44df589a08d5
> > > 3201aeab%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6364699051616382
> > > 92=tL3UzNW4OBuFa1LMIzZsyR8eSqBoZ7hWVJipnznrQ5Q%3D=0
> > >
> > > On Wed, Nov 22, 2017 at 1:12 AM, Jean-Baptiste Onofré
> > > 
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> FYI, I'm in touch with Microsoft Azure team about that.
> > >>
> > >> We are testing the ADLS support via HDFS.
> > >>
> > >> I keep you posted.
> > >>
> > >> Regards
> > >> JB
> > >>
> > >> On 11/22/2017 09:12 AM, Milan Chandna wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> Has anyone tried IO from(to) ADLS account on Beam with Spark runner?
> > >>> I was trying recently to do this but was unable to make it work.
> > >>>
> > >>> Steps that I tried:
> > >>>
> > >>> 1.  Took HDI + Spark 1.6 cluster with default storage as ADLS
> > account.
> > >>> 2.  Built Apache Beam on that. Built to include Beam-2790<
> > >>> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fiss
> > >>> u
> > >>> es.apache.org%2Fjira%2Fbrowse%2FBEAM-2790=02%7C01%
> > 7CMilan.Chandna%40microsoft.com%7Cb7dffcc26bfe44df589a08d53201aeab%
> > 7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636469905161638292=aj%
> > 2FlaXlhlOQtnlRqHh8yLs2KfOZuRwDUUFvTpLB3Atg%3D=0> fix which
> > earlier I was facing for ADL as well.
> > >>> 3.  Modified WordCount.java example to use
> 

Performance tests - Spark and Flink current state of knowledge.

2017-11-28 Thread Łukasz Gajowy
Hello!

Part of the job while writing the performance test infrastructure is to be
able to run them on Spark and Flink. They seem to be problematic though. We
provided a short description and a Proof of Concept showing our current
state of knowledge and the only way we were able to actually run the test
code. The document may change in the next days as we are still looking for
better ways to run the tests.

Here's the doc:
https://docs.google.com/a/polidea.com/document/d/1a_SKXYndh1CMovxxdUdpkdHRpd5uzF-IhzBMiaOh3N4/edit?usp=sharing

Please feel free to contribute if you have any thoughts or suggestions or
spot errors of any kind.

Łukasz


Re: [VOTE] Use Gradle for Apache Beam developmental processes

2017-11-28 Thread Manu Zhang
+1 (binding)


On Wed, Nov 29, 2017 at 5:08 AM Reuven Lax  wrote:

> +1 (binding)
>
> One caveat to the second part of this vote. I think we need to elaborate a
> clear list of criteria that Gradle must clear before any processes are
> migrated off of Maven.
>
> On Tue, Nov 28, 2017 at 12:51 PM, Romain Manni-Bucau <
> rmannibu...@gmail.com> wrote:
>
>> -1 (non binding) gradle discourages contributions which is a big pitfall
>> for an asf project and maven/gradle comparison is unfair due to the
>> threading setup of maven (hardcoded thread count and no parallelize builder
>> tusage).
>>
>>
>> Le 28 nov. 2017 19:38, "Jason Kuster"  a écrit :
>>
>>> +1
>>>
>>> From the perspective of Beam's infrastructure, I've found that Gradle
>>> provides us a good amount more flexibility to do the kinds of builds we
>>> want. Additionally, the shorter run times (while not the only factor here)
>>> will allow us to stretch our finite executor resources further, leading to
>>> fewer instances where people are waiting for other builds to finish for
>>> their presubmits to start.
>>>
>>> On Tue, Nov 28, 2017 at 10:22 AM, Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>>
 +1

 And thanks Luke for clearly mentioning the migration process. Let's
 make sure that all major use cases of Maven are properly addressed before
 removing Maven support.

 Thanks,
 Cham


 On Tue, Nov 28, 2017 at 10:09 AM Wesley Tanaka <
 wtanaka+b...@wtanaka.com> wrote:

> +1
>
>
> On 11/28/2017 07:55 AM, Lukasz Cwik wrote:
>
> This is a procedural vote for migrating to use Gradle for all our
> development related processes (building, testing, and releasing). A
> majority vote will signal that:
> * Gradle build files will be supported and maintained alongside any
> remaining Maven files.
> * Once Gradle is able to replace Maven in a specific process (or
> portion thereof), Maven will no longer be maintained for said process (or
> portion thereof) and will be removed.
>
> +1 I support the process change
> 0 I am indifferent to the process change
> -1 I would like to remain with our current processes
>
>
> 
>
> Below is a summary of information contained in the disucssion thread
> comparing Gradle and Maven:
> https://lists.apache.org/thread.html/225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe253@%3Cdev.beam.apache.org%3E
>
> Gradle (mins)
> min: 25.04
> max: 160.14
> median: 45.78
> average: 52.19
> stdev: 30.80
>
> Maven (mins)
> min: 56.86
> max: 216.55 (actually > 240 mins because this data does not include
> timeouts)
> median: 87.93
> average: 109.10
> stdev: 48.01
>
> Maven
> Java Support: Mature
> Python Support: None (via mvn exec plugin)
> Go Support: Rudimentary (via mvn plugin)
> Protobuf Support: Rudimentary (via mvn plugin)
> Docker Support: Rudimentary (via mvn plugin)
> ASF Release Automation: Mature
> Jenkins Support: Mature
> Configuration Language: XML
> Multiple Java Versions: Yes
> Static Analysis Tools: Some
> ASF Release Audit Tool (RAT): Rudimentary (plugin complete and
> longstanding but poor)
> IntelliJ Integration: Mature
> Eclipse Integration: Mature
> Extensibility: Mature (updated per JB from discuss thread)
> Number of GitHub Projects Using It: 146k
> Continuous build daemon: None
> Incremental build support: None (note that this is not the same as
> incremental compile support offered by the compiler plugin)
> Intra-module dependencies: Rudimentary (requires the use of many
> profiles to get per runner dependencies)
>
> Gradle
> Java Support: Mature
> Python Support: Rudimentary (pygradle, lacks pypi support)
> Go Support: Rudimentary (gogradle plugin)
> Protobuf Support: Rudimentary (via protobuf plugin)
> Docker Support: Rudimentary (via docker plugin)
> ASF Release Automation: ?
> Jenkins Support: Mature
> Configuration Language: Groovy
> Multiple Java Versions: Yes
> Static Analysis Tools: Some
> ASF Release Audit Tool (RAT): Rudimentary (plugin just calls Apache
> Maven ANT plugin)
> IntelliJ Integration: Mature
> Eclipse Integration: Mature
> Extensibility: Mature
> Number of GitHub Projects Using It: 122k
> Continuous build daemon: Mature
> Incremental build support: Mature
> Intra-module dependencies: Mature (via configurations)
>
>
> --
> Wesley Tanakahttps://wtanaka.com/
>
>
>>>
>>>
>>> --
>>> ---
>>> Jason Kuster
>>> Apache Beam / Google Cloud Dataflow
>>>
>>
>


Re: SerializableCoder Structured Value

2017-11-28 Thread Eugene Kirpichov
Kenn - I agree that consistentWithEquals() is redundant w.r.t.
structuralValue(), and should be deprecated. I think our mutation detectors
are already using structuralValue(), so the work here would be to simply
mark the method deprecated, remove all remaining overrides in the SDK, and
document that overriding the method is a no-op.

Also agree with Luke that we should document that SerializableCoder should
be used only for objects that have a proper equals(). Because we need
*some* structural
value for mutation detection, and since this coder is non-deterministic, it
can't provide any structural value other than the object itself. In this
case, I suppose, the work would involve just documentation.

Not sure if we need anything extra in the direct runner for verification:
won't existing mutation detectors already fire false positives in case
these properties are violated?

On Tue, Nov 28, 2017 at 11:03 AM Lukasz Cwik  wrote:

> I think that at least we should be clear in the documentation for
> SerializableCoder and also make sure that the DirectRunner validates the
> consistentWithEquals property.
>
> Optionally one of:
> 1) Make a version of SerializableCoder that can be constructed where it
> says it is consistentWithEquals and have users register each type with the
> CoderRegistry.
> 2) Document that users subclass SerializableCoder for all types which are
> consistentWtihEquals and also register them with the CoderRegistry.
>
>
> On Mon, Nov 27, 2017 at 5:39 PM, Kenneth Knowles  wrote:
>
>> What I said is not quite right - there are accidental collisions allowed.
>> The "all coders" spec for structural value only requires that encode(a) ==
>> encode(b) implies sv(a).equals(sv(b)). The converse is not required. For
>> example, the nondeterministic SetCoder can use the Set objects themselves
>> as structural values, but their encoding may differ. So for determinism it
>> is actually a.equals(b) implies encode(a) == encode(b) which in turn
>> implies sv(a).equals(sv(b)). Either way, for deterministic coders they all
>> coincide.
>>
>> On Mon, Nov 27, 2017 at 5:23 PM, Kenneth Knowles  wrote:
>>
>>> To add some flavor,
>>>
>>> *All coders:* structuralValue(a).equals(structuralValue(b)) if and only
>>> if encode(a) == encode(b)
>>>
>>> *"Consistent with equals" aka injective:* encode(a) == encode(b)
>>> implies a.equals(b)
>>>
>>> *Deterministic:* a.equals(b) implies
>>> structuralValue(a).equals(structuralValue(b)) (hence encode(a) == encode(b))
>>>
>>> The structural value must always be a legitimate substitute for encoding
>>> to allow in-memory GBK to be faster than encoding.
>>>
>>> IMO we should deprecate and retire "consistent with equals" since
>>> overriding it to return `true` is no simpler than overriding
>>> structuralValue itself, and it has no purpose other than governing
>>> structuralValue. The two obvious choices - encoding or return directly -
>>> are trivial, and getting fancy is optional. The check Luke suggests would
>>> then just be a test that structuralValue is correct. The mutation detector
>>> should perhaps just use the structural value and let the coder itself
>>> decide whether or not it needs to encode.
>>>
>>> Also worth considering the dual perspective that highlights portability:
>>> To a portable runner, the elements are (with a couple exceptions) just
>>> bytes, and the coders are a way for the SDK to interpret them in order to
>>> do its computation. The implied spec that the mutation detector relies on
>>> is that serialize(deserialize(x)) == x for these bytes, so if the
>>> re-serialized bytes have changed, it assumes the object was mutated. In a
>>> sense, if an SDK implements "the identity function" yet returns different
>>> bytes, that is a broken identity function because the bytes *are* the
>>> element. It is a bit of a strict interpretation, and maybe not so useful
>>> when the elements are only really interpreted by a single SDK, as in the
>>> case of SerializableCoder. But I'm not sure what other spec is available.
>>>
>>> Kenn
>>>
>>>
>>> On Mon, Nov 27, 2017 at 4:37 PM, Mairbek Khadikov 
>>> wrote:
>>>
 I'm open to renaming *consistentWithEquals*.

 If I understand the code correctly, when consistentWithEquals returns
 true, org.apache.beam.sdk.util.MutationDetectors expects
 *a.equals(deserialize(serialize(a))* which I think is reasonable for
 SerializableCoder (assuming objects implement equals)*. *Right now,
 *serialize(a).equals(serialize(deserialize(serialize(a)))* is expected
 and that contradicts *"does not guarantee a deterministic encoding"*.

 On Mon, Nov 27, 2017 at 4:07 PM, Lukasz Cwik  wrote:

> I think the idea is that SerializableCoder should be updated to expect
> that all values it encodes do implement equals() since this seems to be 
> the
> much more common case then classes that don't 

Re: [VOTE] Use Gradle for Apache Beam developmental processes

2017-11-28 Thread Reuven Lax
+1 (binding)

One caveat to the second part of this vote. I think we need to elaborate a
clear list of criteria that Gradle must clear before any processes are
migrated off of Maven.

On Tue, Nov 28, 2017 at 12:51 PM, Romain Manni-Bucau 
wrote:

> -1 (non binding) gradle discourages contributions which is a big pitfall
> for an asf project and maven/gradle comparison is unfair due to the
> threading setup of maven (hardcoded thread count and no parallelize builder
> tusage).
>
>
> Le 28 nov. 2017 19:38, "Jason Kuster"  a écrit :
>
>> +1
>>
>> From the perspective of Beam's infrastructure, I've found that Gradle
>> provides us a good amount more flexibility to do the kinds of builds we
>> want. Additionally, the shorter run times (while not the only factor here)
>> will allow us to stretch our finite executor resources further, leading to
>> fewer instances where people are waiting for other builds to finish for
>> their presubmits to start.
>>
>> On Tue, Nov 28, 2017 at 10:22 AM, Chamikara Jayalath <
>> chamik...@google.com> wrote:
>>
>>> +1
>>>
>>> And thanks Luke for clearly mentioning the migration process. Let's make
>>> sure that all major use cases of Maven are properly addressed before
>>> removing Maven support.
>>>
>>> Thanks,
>>> Cham
>>>
>>>
>>> On Tue, Nov 28, 2017 at 10:09 AM Wesley Tanaka 
>>> wrote:
>>>
 +1


 On 11/28/2017 07:55 AM, Lukasz Cwik wrote:

 This is a procedural vote for migrating to use Gradle for all our
 development related processes (building, testing, and releasing). A
 majority vote will signal that:
 * Gradle build files will be supported and maintained alongside any
 remaining Maven files.
 * Once Gradle is able to replace Maven in a specific process (or
 portion thereof), Maven will no longer be maintained for said process (or
 portion thereof) and will be removed.

 +1 I support the process change
 0 I am indifferent to the process change
 -1 I would like to remain with our current processes

 
 

 Below is a summary of information contained in the disucssion thread
 comparing Gradle and Maven: https://lists.apache.org/threa
 d.html/225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe
 253@%3Cdev.beam.apache.org%3E

 Gradle (mins)
 min: 25.04
 max: 160.14
 median: 45.78
 average: 52.19
 stdev: 30.80

 Maven (mins)
 min: 56.86
 max: 216.55 (actually > 240 mins because this data does not include
 timeouts)
 median: 87.93
 average: 109.10
 stdev: 48.01

 Maven
 Java Support: Mature
 Python Support: None (via mvn exec plugin)
 Go Support: Rudimentary (via mvn plugin)
 Protobuf Support: Rudimentary (via mvn plugin)
 Docker Support: Rudimentary (via mvn plugin)
 ASF Release Automation: Mature
 Jenkins Support: Mature
 Configuration Language: XML
 Multiple Java Versions: Yes
 Static Analysis Tools: Some
 ASF Release Audit Tool (RAT): Rudimentary (plugin complete and
 longstanding but poor)
 IntelliJ Integration: Mature
 Eclipse Integration: Mature
 Extensibility: Mature (updated per JB from discuss thread)
 Number of GitHub Projects Using It: 146k
 Continuous build daemon: None
 Incremental build support: None (note that this is not the same as
 incremental compile support offered by the compiler plugin)
 Intra-module dependencies: Rudimentary (requires the use of many
 profiles to get per runner dependencies)

 Gradle
 Java Support: Mature
 Python Support: Rudimentary (pygradle, lacks pypi support)
 Go Support: Rudimentary (gogradle plugin)
 Protobuf Support: Rudimentary (via protobuf plugin)
 Docker Support: Rudimentary (via docker plugin)
 ASF Release Automation: ?
 Jenkins Support: Mature
 Configuration Language: Groovy
 Multiple Java Versions: Yes
 Static Analysis Tools: Some
 ASF Release Audit Tool (RAT): Rudimentary (plugin just calls Apache
 Maven ANT plugin)
 IntelliJ Integration: Mature
 Eclipse Integration: Mature
 Extensibility: Mature
 Number of GitHub Projects Using It: 122k
 Continuous build daemon: Mature
 Incremental build support: Mature
 Intra-module dependencies: Mature (via configurations)


 --
 Wesley Tanakahttps://wtanaka.com/


>>
>>
>> --
>> ---
>> Jason Kuster
>> Apache Beam / Google Cloud Dataflow
>>
>


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-28 Thread Romain Manni-Bucau
@Lukasz: was it tested in current setup? I know groovy does it but never
checked it myself. If not the vote must be conditional to that IMHO.

Le 28 nov. 2017 19:19, "Robert Bradshaw"  a écrit :

> I also did an apache github query
>
> select count(*) as apache_projects, sum(uses_maven=true) as
> uses_maven, sum(uses_gradle=true) as uses_gradle from (
> select
> repo_name,
> max(path contains 'pom.xml') as uses_maven,
> max(path contains 'gradle') as uses_gradle
> from [bigquery-public-data:github_repos.files]
> where instr(repo_name, 'apache/') == 1
> group by repo_name);
>
> Of 425 total apache projects on gitub, just over half (249) use maven,
> and only 25 use gradle. So we'd be in the minority, but certainly not
> alone.
>
> I don't think we need to use the most common tool, rather we should
> use what fits the project well, and the popularity criteria is simply
> that we don't want to choose a tool where obscurity would be a
> hinderance. Both gradle and maven seem to clear this bar (as do a host
> of others that are even more popular, but would be unsuitable for
> other reasons, e.g. plain old make).
>
> We would certainly not switch over to gradle if we couldn't do a
> release. IIRC, there's still some work to be done to push this
> through, but at this point it doesn't seem like there's any reason to
> expect it couldn't be done.
>
> Is there any more data that should be gathered before a vote? (Or
> should the vote perhaps have a "+/-0, need more information [please
> provide details]" option.)
>
>
> On Tue, Nov 28, 2017 at 9:45 AM, Scott Wegner  wrote:
> > To add one more data point measuring general adoption of gradle vs.
> maven,
> > we can look at Stackoverflow trends comparing the two tags [1]. This
> shows
> > the percentage of new SO questions in a given month by tag. 'gradle'
> > represents ~0.25% of questions, while maven is ~0.45%. So, maven is more
> > dominant in the Stackoverflow community, but they are at least similar
> > orders of magnitude. Also, the data is a bit noisy to draw a trendline,
> but
> > it seems that maven's growth has flattened while gradle is still
> increasing.
> >
> > [1] https://insights.stackoverflow.com/trends?tags=maven%2Cgradle
> >
> > On Tue, Nov 28, 2017 at 9:14 AM Kenneth Knowles  wrote:
> >>
> >> Yea, I think voting is the next step. Luke - I think you are obviously
> the
> >> right person to set up the email of what exactly we are voting on, since
> >> you've driven this improvement.
> >>
> >> On Tue, Nov 28, 2017 at 12:08 AM, Robert Bradshaw 
> >> wrote:
> >>>
> >>> It's great to see all the discussion going on here.
> >>>
> >>> I think it's important to point out that merging a parallel set of
> >>> gradle build scripts is a separate (and much less disruptive) step
> >>> than, say, switching over the default (or even recommended)
> >>> build/release process to use them, let alone removing the maven build
> >>> files entirely. The latter two should definitely be gated by a formal
> >>> vote (each, probably), with the current state the gradle files can
> >>> mostly be ignored by most people. In particular, this is the kind of
> >>> change that needs to be in master to be evaluated--if it's on a branch
> >>> we can't very well see how it impacts presubmits, and most importantly
> >>> people can't try it out for real development.
> >>>
> >>> I agree that the choice of build tool may attract some contributors
> >>> and discourage others. Having builds that are fast, correct, and
> >>> reproducible will probably matter more to potential contributors than
> >>> the particular command to run. While maven can surely be improved, I
> >>> doubt a 2x improvement (and many more times that for incremental
> >>> builds) is low-hanging fruit, and many of the issues seem quite
> >>> fundamental (e.g. all the special treatment we need for NeedsRunner
> >>> tests, and having to do a (global-by-default) mvn install to skip
> >>> tests of dependencies when testing a leaf module).
> >>>
> >>> Getting data on what other apache projects use could be interesting,
> >>> but unless we gather why such choices were made I don't know that it'd
> >>> be that influential once we've established that both tools are widely
> >>> supported generally.
> >>>
> >>> To re-emphasize, we'll continue to produce and publish maven
> >>> artifacts, so our choice of build system won't matter for those only
> >>> using Beam as a dependency.
> >>>
> >>>
> >>>
> >>> On Mon, Nov 27, 2017 at 10:48 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net>
> >>> wrote:
> >>> > Yeah, especially, I think it would have been great to have a vote
> >>> > before
> >>> > merging on master.
> >>> >
> >>> > Not a big deal, however, I'm really community focus ;)
> >>> >
> >>> > Regards
> >>> > JB
> >>> >
> >>> > On 11/28/2017 07:36 AM, Reuven Lax wrote:
> >>> >>
> >>> >> Agreed. I thinking having a formal vote before Luke had 

Re: [VOTE] Use Gradle for Apache Beam developmental processes

2017-11-28 Thread Romain Manni-Bucau
-1 (non binding) gradle discourages contributions which is a big pitfall
for an asf project and maven/gradle comparison is unfair due to the
threading setup of maven (hardcoded thread count and no parallelize builder
tusage).


Le 28 nov. 2017 19:38, "Jason Kuster"  a écrit :

> +1
>
> From the perspective of Beam's infrastructure, I've found that Gradle
> provides us a good amount more flexibility to do the kinds of builds we
> want. Additionally, the shorter run times (while not the only factor here)
> will allow us to stretch our finite executor resources further, leading to
> fewer instances where people are waiting for other builds to finish for
> their presubmits to start.
>
> On Tue, Nov 28, 2017 at 10:22 AM, Chamikara Jayalath  > wrote:
>
>> +1
>>
>> And thanks Luke for clearly mentioning the migration process. Let's make
>> sure that all major use cases of Maven are properly addressed before
>> removing Maven support.
>>
>> Thanks,
>> Cham
>>
>>
>> On Tue, Nov 28, 2017 at 10:09 AM Wesley Tanaka 
>> wrote:
>>
>>> +1
>>>
>>>
>>> On 11/28/2017 07:55 AM, Lukasz Cwik wrote:
>>>
>>> This is a procedural vote for migrating to use Gradle for all our
>>> development related processes (building, testing, and releasing). A
>>> majority vote will signal that:
>>> * Gradle build files will be supported and maintained alongside any
>>> remaining Maven files.
>>> * Once Gradle is able to replace Maven in a specific process (or portion
>>> thereof), Maven will no longer be maintained for said process (or portion
>>> thereof) and will be removed.
>>>
>>> +1 I support the process change
>>> 0 I am indifferent to the process change
>>> -1 I would like to remain with our current processes
>>>
>>> 
>>> 
>>>
>>> Below is a summary of information contained in the disucssion thread
>>> comparing Gradle and Maven: https://lists.apache.org/threa
>>> d.html/225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe
>>> 253@%3Cdev.beam.apache.org%3E
>>>
>>> Gradle (mins)
>>> min: 25.04
>>> max: 160.14
>>> median: 45.78
>>> average: 52.19
>>> stdev: 30.80
>>>
>>> Maven (mins)
>>> min: 56.86
>>> max: 216.55 (actually > 240 mins because this data does not include
>>> timeouts)
>>> median: 87.93
>>> average: 109.10
>>> stdev: 48.01
>>>
>>> Maven
>>> Java Support: Mature
>>> Python Support: None (via mvn exec plugin)
>>> Go Support: Rudimentary (via mvn plugin)
>>> Protobuf Support: Rudimentary (via mvn plugin)
>>> Docker Support: Rudimentary (via mvn plugin)
>>> ASF Release Automation: Mature
>>> Jenkins Support: Mature
>>> Configuration Language: XML
>>> Multiple Java Versions: Yes
>>> Static Analysis Tools: Some
>>> ASF Release Audit Tool (RAT): Rudimentary (plugin complete and
>>> longstanding but poor)
>>> IntelliJ Integration: Mature
>>> Eclipse Integration: Mature
>>> Extensibility: Mature (updated per JB from discuss thread)
>>> Number of GitHub Projects Using It: 146k
>>> Continuous build daemon: None
>>> Incremental build support: None (note that this is not the same as
>>> incremental compile support offered by the compiler plugin)
>>> Intra-module dependencies: Rudimentary (requires the use of many
>>> profiles to get per runner dependencies)
>>>
>>> Gradle
>>> Java Support: Mature
>>> Python Support: Rudimentary (pygradle, lacks pypi support)
>>> Go Support: Rudimentary (gogradle plugin)
>>> Protobuf Support: Rudimentary (via protobuf plugin)
>>> Docker Support: Rudimentary (via docker plugin)
>>> ASF Release Automation: ?
>>> Jenkins Support: Mature
>>> Configuration Language: Groovy
>>> Multiple Java Versions: Yes
>>> Static Analysis Tools: Some
>>> ASF Release Audit Tool (RAT): Rudimentary (plugin just calls Apache
>>> Maven ANT plugin)
>>> IntelliJ Integration: Mature
>>> Eclipse Integration: Mature
>>> Extensibility: Mature
>>> Number of GitHub Projects Using It: 122k
>>> Continuous build daemon: Mature
>>> Incremental build support: Mature
>>> Intra-module dependencies: Mature (via configurations)
>>>
>>>
>>> --
>>> Wesley Tanakahttps://wtanaka.com/
>>>
>>>
>
>
> --
> ---
> Jason Kuster
> Apache Beam / Google Cloud Dataflow
>


Re: [discuss] java profile

2017-11-28 Thread Romain Manni-Bucau
Lukasz: only for an isolated "system" which is a module - assuming you
still want to be able to build a submodule without building and
revalidating the whole tree which is important in dev IMO. This means you
shouldnt handle inputs outside "current" module and therefore miss easily
some (typically test) execution/coverage - incremental compilation/style
check etc is not that costly compared to tests. So this is far to be as
straight forward as it looks if you want to keep the same guarantees. Once
again for checkstyle/findbugs/javac/rat etc this is trivial - and even with
maven actually - but doesnt save a significant amount of time in dev
whereas tests do.

Le 28 nov. 2017 20:11, "Kenneth Knowles"  a écrit :

I seem to remember a tool called `make` that was pretty good at this.

On Tue, Nov 28, 2017 at 10:47 AM, Lukasz Cwik  wrote:

> Its been well shown that a build system that uses input/output set change
> detection can correctly implement incremental builds. Build systems are not
> tied to knowing the internal details of how Java compiles things. Knowing
> that there are some inputs, a process, and some outputs is enough to know
> when the process needs to be rerun.
>
> On Mon, Nov 27, 2017 at 9:53 PM, Romain Manni-Bucau  > wrote:
>
>> Hmm, no.
>>
>> Incremental build is never correctly implemented cause there is just no
>> way to detect some dependencies statically with java code - or any dynamic
>> language.
>>
>> Side note: same applies for gradle daemon usage BTW.
>>
>> After if the list is not maintained it is a bug at the same level than
>> coding a toString() with "null.toString()". This is not very hard to handle
>> the list of modules and worse case a mvnextension can make it coded if you
>> feel more comfortable with this kind of solution.
>>
>> Le 27 nov. 2017 23:12, "Lukasz Cwik"  a écrit :
>>
>>> Manually whitelisting/blacklisting sub-modules is error prone since it
>>> hides issues due to incorrectly maintaining that list is the same
>>> argument
>>> as if the build process doesn't correctly invoke an incremental build
>>> process.
>>>
>>> On Mon, Nov 27, 2017 at 1:45 PM, Romain Manni-Bucau <
>>> rmannibu...@gmail.com>
>>> wrote:
>>>
>>> > Well for validation builds- pre PR, incremental support is pointless
>>> since
>>> > it easily hides issues die to caching so a solution saving half of the
>>> > build without loosing anuyhing would still be good IMHO.
>>> >
>>> > Le 27 nov. 2017 21:12, "Lukasz Cwik"  a
>>> écrit :
>>> >
>>> > > Incremental builds aren't correctly setup right now so your likely
>>> to see
>>> > > Python/Go rebuild even if there were no changes. See
>>> > > https://issues.apache.org/jira/browse/BEAM-3253
>>> > >
>>> > > On Mon, Nov 27, 2017 at 11:46 AM, Romain Manni-Bucau <
>>> > > rmannibu...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > that was the goal: validate there was no side effect of the
>>> changes on
>>> > > > the whole project. Now the "not java" part of the build will not be
>>> > > > impacted by java changed so this is the part i want to skip since
>>> it
>>> > > > takes a lot of time and I have guarantees it is safe to skip them.
>>> > > >
>>> > > > Romain Manni-Bucau
>>> > > > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>>> > > >
>>> > > >
>>> > > > 2017-11-27 20:28 GMT+01:00 Lukasz Cwik :
>>> > > > > Romain, that will build the entire project. I think you want to
>>> > execute
>>> > > > > (from the root of the project):
>>> > > > > ./gradlew :beam-sdks-parent:beam-sdks-python:build
>>> > > > >
>>> > > > > On Mon, Nov 27, 2017 at 11:25 AM, Romain Manni-Bucau <
>>> > > > rmannibu...@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > >> gradle build --no-daemon
>>> > > > >>
>>> > > > >> (with gradle 4.2)
>>> > > > >>
>>> > > > >> Romain Manni-Bucau
>>> > > > >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>>> > > > >>
>>> > > > >>
>>> > > > >> 2017-11-27 20:21 GMT+01:00 Kenneth Knowles
>>> >> > >:
>>> > > > >> > What is the gradle command you are using to build just the
>>> Python
>>> > > SDK?
>>> > > > >> >
>>> > > > >> > On Mon, Nov 27, 2017 at 11:19 AM, Romain Manni-Bucau <
>>> > > > >> rmannibu...@gmail.com>
>>> > > > >> > wrote:
>>> > > > >> >
>>> > > > >> >> Hmm,
>>> > > > >> >>
>>> > > > >> >> issue is the same with gradle (locally python build takes
>>> 15mn
>>> > > alone
>>> > > > >> >> which is as much as the java build and it is not
>>> parallelized I
>>> > > > think)
>>> > > > >> >>
>>> > > > >> >> pl is not as smooth since it means doing it on each command
>>> > whereas
>>> > > > >> >> the proposal is automatically activated through settings.xml
>>> > > > >> >>
>>> > > > >> >> Romain Manni-Bucau
>>> > > > >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>>> > > > >> >>
>>> > > > >> >>
>>> > > > >> >> 2017-11-27 20:07 GMT+01:00 Kenneth Knowles
>>> > 

Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread Kenneth Knowles
On Tue, Nov 28, 2017 at 11:16 AM, Raghu Angadi  wrote:

> -1 for (a): no need to see all the private branch commits from
> contributor. It often makes me more conscious of local commits.
>

I want to note that on my PRs these are not private commits. Each one is a
meaningful isolated change that can be rolled back and is useful to keep
separate when looking at `git blame` or the history of a file. I would
encourage every contributor to also do this. A PR is the unit of code
review, but the unit of meaningful change to a repository is often much
smaller.

Kenn


> +1 for (b): with committer replacing the squashed commit messages with
> '[BEAM-jira or PRID]: Brief cut-n-paste (or longer if it contributor
> provided one)'.
> -1 for (c): This is quite painful for contributors to work with if there
> has been merge conflict with master. Especially for larger PRs with
> multiple updates.
>
> On Tue, Nov 28, 2017 at 10:24 AM, Lukasz Cwik  wrote:
>
>> Is it possible for mergebot to auto squash any fixup! and perform the
>> merge commit as described in (a), if so then I would vote for mergebot.
>>
>> Without mergebot, I vote:
>> (a) 0 I like squashing fixup!
>> (b) -1
>> (c) +1 Most of our PRs are for focused singular changes which is why I
>> would rather squash everything over not squashing anything
>>
>>
>>
>> On Tue, Nov 28, 2017 at 9:57 AM, Kenneth Knowles  wrote:
>>
>>> On Tue, Nov 28, 2017 at 9:51 AM, Ben Chambers 
>>> wrote:
>>>
 One risk to "squash and merge" is that it may lead to commits that
 don't have clean descriptions -- for instance, commits like "Fixing review
 comments" will show up. If we use (a) these would also show up as separate
 commits. It seems like there are two cases of multiple commits in a PR:

 1. Multiple commits in a PR that have semantic meaning (eg., a PR
 performed N steps, split across N commits). In this case, keeping the
 descriptions and performing either a merge (if the commits are separately
 valid) or squash (if we want the commits to become a single commit in
 master) probably makes sense.

>>>
>>> Keep 'em
>>>
>>>
 2. Multiple commits in a PR that just reflect the review history. In
 this case, we should probably ask the PR author to explicitly rebase their
 PR to have semantically meaningful commits prior to merging. (Eg., do a
 rebase -i).

>>>
>>> Ask the author to squash 'em.
>>>
>>> Kenn
>>>
>>>

 On Tue, Nov 28, 2017 at 9:46 AM Kenneth Knowles  wrote:

> Hi all,
>
> James brought up a great question in Slack, which was how should we
> use the merge button, illustrated [1]
>
> I want to broaden the discussion to talk about all the new
> capabilities:
>
> 1. Whether & how to use the "reviewer" field
> 2. Whether & how to use the "assignee" field
> 3. Whether & how to use the merge button
>
> My preferences are:
>
> 1. Use the reviewer field instead of "R:" comments.
> 2. Use the assignee field to keep track of who the review is blocked
> on (either the reviewer for more comments or the author for fixes)
> 3. Use merge commits, but editing the commit subject line
>
> To expand on part 3, GitHub's merge button has three options [1]. They
> are not described accurately in the UI, as they all say "merge" when only
> one of them performs a merge. They do the following:
>
> (a) Merge the branch with a merge commit
> (b) Squash all the commits, rebase and push
> (c) Rebase and push without squash
>
> Unlike our current guide, all of these result in a "merged" status for
> the PR, so we can correctly distinguish those PRs that were actually 
> merged.
>
> My votes on these options are:
>
> (a) +1 this preserves the most information
> (b) -1 this erases the most information
> (c) -0 this is just sort of a middle ground; it breaks commit hashes,
> does not have a clear merge commit, but preserves other info
>
> Kenn
>
> [1] https://apachebeam.slack.com/messages/C1AAFJYMP/
>
>
>
>
>
> Kenn
>

>>>
>>
>


Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread Raghu Angadi
-1 for (a): no need to see all the private branch commits from contributor.
It often makes me more conscious of local commits.
+1 for (b): with committer replacing the squashed commit messages with
'[BEAM-jira or PRID]: Brief cut-n-paste (or longer if it contributor
provided one)'.
-1 for (c): This is quite painful for contributors to work with if there
has been merge conflict with master. Especially for larger PRs with
multiple updates.

On Tue, Nov 28, 2017 at 10:24 AM, Lukasz Cwik  wrote:

> Is it possible for mergebot to auto squash any fixup! and perform the
> merge commit as described in (a), if so then I would vote for mergebot.
>
> Without mergebot, I vote:
> (a) 0 I like squashing fixup!
> (b) -1
> (c) +1 Most of our PRs are for focused singular changes which is why I
> would rather squash everything over not squashing anything
>
>
>
> On Tue, Nov 28, 2017 at 9:57 AM, Kenneth Knowles  wrote:
>
>> On Tue, Nov 28, 2017 at 9:51 AM, Ben Chambers 
>> wrote:
>>
>>> One risk to "squash and merge" is that it may lead to commits that don't
>>> have clean descriptions -- for instance, commits like "Fixing review
>>> comments" will show up. If we use (a) these would also show up as separate
>>> commits. It seems like there are two cases of multiple commits in a PR:
>>>
>>> 1. Multiple commits in a PR that have semantic meaning (eg., a PR
>>> performed N steps, split across N commits). In this case, keeping the
>>> descriptions and performing either a merge (if the commits are separately
>>> valid) or squash (if we want the commits to become a single commit in
>>> master) probably makes sense.
>>>
>>
>> Keep 'em
>>
>>
>>> 2. Multiple commits in a PR that just reflect the review history. In
>>> this case, we should probably ask the PR author to explicitly rebase their
>>> PR to have semantically meaningful commits prior to merging. (Eg., do a
>>> rebase -i).
>>>
>>
>> Ask the author to squash 'em.
>>
>> Kenn
>>
>>
>>>
>>> On Tue, Nov 28, 2017 at 9:46 AM Kenneth Knowles  wrote:
>>>
 Hi all,

 James brought up a great question in Slack, which was how should we use
 the merge button, illustrated [1]

 I want to broaden the discussion to talk about all the new capabilities:

 1. Whether & how to use the "reviewer" field
 2. Whether & how to use the "assignee" field
 3. Whether & how to use the merge button

 My preferences are:

 1. Use the reviewer field instead of "R:" comments.
 2. Use the assignee field to keep track of who the review is blocked on
 (either the reviewer for more comments or the author for fixes)
 3. Use merge commits, but editing the commit subject line

 To expand on part 3, GitHub's merge button has three options [1]. They
 are not described accurately in the UI, as they all say "merge" when only
 one of them performs a merge. They do the following:

 (a) Merge the branch with a merge commit
 (b) Squash all the commits, rebase and push
 (c) Rebase and push without squash

 Unlike our current guide, all of these result in a "merged" status for
 the PR, so we can correctly distinguish those PRs that were actually 
 merged.

 My votes on these options are:

 (a) +1 this preserves the most information
 (b) -1 this erases the most information
 (c) -0 this is just sort of a middle ground; it breaks commit hashes,
 does not have a clear merge commit, but preserves other info

 Kenn

 [1] https://apachebeam.slack.com/messages/C1AAFJYMP/





 Kenn

>>>
>>
>


Re: [discuss] java profile

2017-11-28 Thread Kenneth Knowles
I seem to remember a tool called `make` that was pretty good at this.

On Tue, Nov 28, 2017 at 10:47 AM, Lukasz Cwik  wrote:

> Its been well shown that a build system that uses input/output set change
> detection can correctly implement incremental builds. Build systems are not
> tied to knowing the internal details of how Java compiles things. Knowing
> that there are some inputs, a process, and some outputs is enough to know
> when the process needs to be rerun.
>
> On Mon, Nov 27, 2017 at 9:53 PM, Romain Manni-Bucau  > wrote:
>
>> Hmm, no.
>>
>> Incremental build is never correctly implemented cause there is just no
>> way to detect some dependencies statically with java code - or any dynamic
>> language.
>>
>> Side note: same applies for gradle daemon usage BTW.
>>
>> After if the list is not maintained it is a bug at the same level than
>> coding a toString() with "null.toString()". This is not very hard to handle
>> the list of modules and worse case a mvnextension can make it coded if you
>> feel more comfortable with this kind of solution.
>>
>> Le 27 nov. 2017 23:12, "Lukasz Cwik"  a écrit :
>>
>>> Manually whitelisting/blacklisting sub-modules is error prone since it
>>> hides issues due to incorrectly maintaining that list is the same
>>> argument
>>> as if the build process doesn't correctly invoke an incremental build
>>> process.
>>>
>>> On Mon, Nov 27, 2017 at 1:45 PM, Romain Manni-Bucau <
>>> rmannibu...@gmail.com>
>>> wrote:
>>>
>>> > Well for validation builds- pre PR, incremental support is pointless
>>> since
>>> > it easily hides issues die to caching so a solution saving half of the
>>> > build without loosing anuyhing would still be good IMHO.
>>> >
>>> > Le 27 nov. 2017 21:12, "Lukasz Cwik"  a
>>> écrit :
>>> >
>>> > > Incremental builds aren't correctly setup right now so your likely
>>> to see
>>> > > Python/Go rebuild even if there were no changes. See
>>> > > https://issues.apache.org/jira/browse/BEAM-3253
>>> > >
>>> > > On Mon, Nov 27, 2017 at 11:46 AM, Romain Manni-Bucau <
>>> > > rmannibu...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > that was the goal: validate there was no side effect of the
>>> changes on
>>> > > > the whole project. Now the "not java" part of the build will not be
>>> > > > impacted by java changed so this is the part i want to skip since
>>> it
>>> > > > takes a lot of time and I have guarantees it is safe to skip them.
>>> > > >
>>> > > > Romain Manni-Bucau
>>> > > > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>>> > > >
>>> > > >
>>> > > > 2017-11-27 20:28 GMT+01:00 Lukasz Cwik :
>>> > > > > Romain, that will build the entire project. I think you want to
>>> > execute
>>> > > > > (from the root of the project):
>>> > > > > ./gradlew :beam-sdks-parent:beam-sdks-python:build
>>> > > > >
>>> > > > > On Mon, Nov 27, 2017 at 11:25 AM, Romain Manni-Bucau <
>>> > > > rmannibu...@gmail.com>
>>> > > > > wrote:
>>> > > > >
>>> > > > >> gradle build --no-daemon
>>> > > > >>
>>> > > > >> (with gradle 4.2)
>>> > > > >>
>>> > > > >> Romain Manni-Bucau
>>> > > > >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>>> > > > >>
>>> > > > >>
>>> > > > >> 2017-11-27 20:21 GMT+01:00 Kenneth Knowles
>>> >> > >:
>>> > > > >> > What is the gradle command you are using to build just the
>>> Python
>>> > > SDK?
>>> > > > >> >
>>> > > > >> > On Mon, Nov 27, 2017 at 11:19 AM, Romain Manni-Bucau <
>>> > > > >> rmannibu...@gmail.com>
>>> > > > >> > wrote:
>>> > > > >> >
>>> > > > >> >> Hmm,
>>> > > > >> >>
>>> > > > >> >> issue is the same with gradle (locally python build takes
>>> 15mn
>>> > > alone
>>> > > > >> >> which is as much as the java build and it is not
>>> parallelized I
>>> > > > think)
>>> > > > >> >>
>>> > > > >> >> pl is not as smooth since it means doing it on each command
>>> > whereas
>>> > > > >> >> the proposal is automatically activated through settings.xml
>>> > > > >> >>
>>> > > > >> >> Romain Manni-Bucau
>>> > > > >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>>> > > > >> >>
>>> > > > >> >>
>>> > > > >> >> 2017-11-27 20:07 GMT+01:00 Kenneth Knowles
>>> > >> > > >:
>>> > > > >> >> > I think you can already mostly do this with mvn -pl
>>> sdks/XYZ
>>> > -am
>>> > > > >> -amd. I
>>> > > > >> >> > think that we have other work (gradle support) underway
>>> that
>>> > will
>>> > > > make
>>> > > > >> >> this
>>> > > > >> >> > a non-issue since gradle automatically does even better
>>> than
>>> > the
>>> > > > >> profile
>>> > > > >> >> or
>>> > > > >> >> > -am -amd.
>>> > > > >> >> >
>>> > > > >> >> > On Mon, Nov 27, 2017 at 11:01 AM, Romain Manni-Bucau <
>>> > > > >> >> rmannibu...@gmail.com>
>>> > > > >> >> > wrote:
>>> > > > >> >> >
>>> > > > >> >> >> Hi guys,
>>> > > > >> >> >>
>>> > > > >> >> >> java/python/go/xxx support is great but as a developer you
>>> > > 

Re: SerializableCoder Structured Value

2017-11-28 Thread Lukasz Cwik
I think that at least we should be clear in the documentation for
SerializableCoder and also make sure that the DirectRunner validates the
consistentWithEquals property.

Optionally one of:
1) Make a version of SerializableCoder that can be constructed where it
says it is consistentWithEquals and have users register each type with the
CoderRegistry.
2) Document that users subclass SerializableCoder for all types which are
consistentWtihEquals and also register them with the CoderRegistry.


On Mon, Nov 27, 2017 at 5:39 PM, Kenneth Knowles  wrote:

> What I said is not quite right - there are accidental collisions allowed.
> The "all coders" spec for structural value only requires that encode(a) ==
> encode(b) implies sv(a).equals(sv(b)). The converse is not required. For
> example, the nondeterministic SetCoder can use the Set objects themselves
> as structural values, but their encoding may differ. So for determinism it
> is actually a.equals(b) implies encode(a) == encode(b) which in turn
> implies sv(a).equals(sv(b)). Either way, for deterministic coders they all
> coincide.
>
> On Mon, Nov 27, 2017 at 5:23 PM, Kenneth Knowles  wrote:
>
>> To add some flavor,
>>
>> *All coders:* structuralValue(a).equals(structuralValue(b)) if and only
>> if encode(a) == encode(b)
>>
>> *"Consistent with equals" aka injective:* encode(a) == encode(b) implies
>> a.equals(b)
>>
>> *Deterministic:* a.equals(b) implies 
>> structuralValue(a).equals(structuralValue(b))
>> (hence encode(a) == encode(b))
>>
>> The structural value must always be a legitimate substitute for encoding
>> to allow in-memory GBK to be faster than encoding.
>>
>> IMO we should deprecate and retire "consistent with equals" since
>> overriding it to return `true` is no simpler than overriding
>> structuralValue itself, and it has no purpose other than governing
>> structuralValue. The two obvious choices - encoding or return directly -
>> are trivial, and getting fancy is optional. The check Luke suggests would
>> then just be a test that structuralValue is correct. The mutation detector
>> should perhaps just use the structural value and let the coder itself
>> decide whether or not it needs to encode.
>>
>> Also worth considering the dual perspective that highlights portability:
>> To a portable runner, the elements are (with a couple exceptions) just
>> bytes, and the coders are a way for the SDK to interpret them in order to
>> do its computation. The implied spec that the mutation detector relies on
>> is that serialize(deserialize(x)) == x for these bytes, so if the
>> re-serialized bytes have changed, it assumes the object was mutated. In a
>> sense, if an SDK implements "the identity function" yet returns different
>> bytes, that is a broken identity function because the bytes *are* the
>> element. It is a bit of a strict interpretation, and maybe not so useful
>> when the elements are only really interpreted by a single SDK, as in the
>> case of SerializableCoder. But I'm not sure what other spec is available.
>>
>> Kenn
>>
>>
>> On Mon, Nov 27, 2017 at 4:37 PM, Mairbek Khadikov 
>> wrote:
>>
>>> I'm open to renaming *consistentWithEquals*.
>>>
>>> If I understand the code correctly, when consistentWithEquals returns
>>> true, org.apache.beam.sdk.util.MutationDetectors expects
>>> *a.equals(deserialize(serialize(a))* which I think is reasonable for
>>> SerializableCoder (assuming objects implement equals)*. *Right now,
>>> *serialize(a).equals(serialize(deserialize(serialize(a)))* is expected
>>> and that contradicts *"does not guarantee a deterministic encoding"*.
>>>
>>> On Mon, Nov 27, 2017 at 4:07 PM, Lukasz Cwik  wrote:
>>>
 I think the idea is that SerializableCoder should be updated to expect
 that all values it encodes do implement equals() since this seems to be the
 much more common case then classes that don't implement a useful equals. It
 would be possible to add a useful check to DirectRunner that any value that
 says its consistent with equals actually obeys its contract.

 On Mon, Nov 27, 2017 at 4:03 PM, Eugene Kirpichov  wrote:

> Not sure where you see the contradiction? consistentWithEquals says
> "Whenever the encoded bytes of two values are equal, then the original
> values are equal according to {@code Objects.equals()}." - which is 
> clearly
> false for Serializable's in general: it's possible that serialized form of
> "a" and "b" is the same bytes, but !a.equals(b), e.g. if this class does
> not implement equals() or if it uses reference equality.
>
> On Mon, Nov 27, 2017 at 3:55 PM Mairbek Khadikov 
> wrote:
>
>> Hi all,
>>
>> Currently SerializableCoder#consistentWithEquals returns false,
>> which contradicts it's own documentation "{@link SerializableCoder} does
>> not guarantee a deterministic 

Re: [discuss] java profile

2017-11-28 Thread Lukasz Cwik
Its been well shown that a build system that uses input/output set change
detection can correctly implement incremental builds. Build systems are not
tied to knowing the internal details of how Java compiles things. Knowing
that there are some inputs, a process, and some outputs is enough to know
when the process needs to be rerun.

On Mon, Nov 27, 2017 at 9:53 PM, Romain Manni-Bucau 
wrote:

> Hmm, no.
>
> Incremental build is never correctly implemented cause there is just no
> way to detect some dependencies statically with java code - or any dynamic
> language.
>
> Side note: same applies for gradle daemon usage BTW.
>
> After if the list is not maintained it is a bug at the same level than
> coding a toString() with "null.toString()". This is not very hard to handle
> the list of modules and worse case a mvnextension can make it coded if you
> feel more comfortable with this kind of solution.
>
> Le 27 nov. 2017 23:12, "Lukasz Cwik"  a écrit :
>
>> Manually whitelisting/blacklisting sub-modules is error prone since it
>> hides issues due to incorrectly maintaining that list is the same argument
>> as if the build process doesn't correctly invoke an incremental build
>> process.
>>
>> On Mon, Nov 27, 2017 at 1:45 PM, Romain Manni-Bucau <
>> rmannibu...@gmail.com>
>> wrote:
>>
>> > Well for validation builds- pre PR, incremental support is pointless
>> since
>> > it easily hides issues die to caching so a solution saving half of the
>> > build without loosing anuyhing would still be good IMHO.
>> >
>> > Le 27 nov. 2017 21:12, "Lukasz Cwik"  a
>> écrit :
>> >
>> > > Incremental builds aren't correctly setup right now so your likely to
>> see
>> > > Python/Go rebuild even if there were no changes. See
>> > > https://issues.apache.org/jira/browse/BEAM-3253
>> > >
>> > > On Mon, Nov 27, 2017 at 11:46 AM, Romain Manni-Bucau <
>> > > rmannibu...@gmail.com>
>> > > wrote:
>> > >
>> > > > that was the goal: validate there was no side effect of the changes
>> on
>> > > > the whole project. Now the "not java" part of the build will not be
>> > > > impacted by java changed so this is the part i want to skip since it
>> > > > takes a lot of time and I have guarantees it is safe to skip them.
>> > > >
>> > > > Romain Manni-Bucau
>> > > > @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>> > > >
>> > > >
>> > > > 2017-11-27 20:28 GMT+01:00 Lukasz Cwik :
>> > > > > Romain, that will build the entire project. I think you want to
>> > execute
>> > > > > (from the root of the project):
>> > > > > ./gradlew :beam-sdks-parent:beam-sdks-python:build
>> > > > >
>> > > > > On Mon, Nov 27, 2017 at 11:25 AM, Romain Manni-Bucau <
>> > > > rmannibu...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > >> gradle build --no-daemon
>> > > > >>
>> > > > >> (with gradle 4.2)
>> > > > >>
>> > > > >> Romain Manni-Bucau
>> > > > >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>> > > > >>
>> > > > >>
>> > > > >> 2017-11-27 20:21 GMT+01:00 Kenneth Knowles
>> > > >:
>> > > > >> > What is the gradle command you are using to build just the
>> Python
>> > > SDK?
>> > > > >> >
>> > > > >> > On Mon, Nov 27, 2017 at 11:19 AM, Romain Manni-Bucau <
>> > > > >> rmannibu...@gmail.com>
>> > > > >> > wrote:
>> > > > >> >
>> > > > >> >> Hmm,
>> > > > >> >>
>> > > > >> >> issue is the same with gradle (locally python build takes 15mn
>> > > alone
>> > > > >> >> which is as much as the java build and it is not parallelized
>> I
>> > > > think)
>> > > > >> >>
>> > > > >> >> pl is not as smooth since it means doing it on each command
>> > whereas
>> > > > >> >> the proposal is automatically activated through settings.xml
>> > > > >> >>
>> > > > >> >> Romain Manni-Bucau
>> > > > >> >> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>> > > > >> >>
>> > > > >> >>
>> > > > >> >> 2017-11-27 20:07 GMT+01:00 Kenneth Knowles
>> > > > > >:
>> > > > >> >> > I think you can already mostly do this with mvn -pl sdks/XYZ
>> > -am
>> > > > >> -amd. I
>> > > > >> >> > think that we have other work (gradle support) underway that
>> > will
>> > > > make
>> > > > >> >> this
>> > > > >> >> > a non-issue since gradle automatically does even better than
>> > the
>> > > > >> profile
>> > > > >> >> or
>> > > > >> >> > -am -amd.
>> > > > >> >> >
>> > > > >> >> > On Mon, Nov 27, 2017 at 11:01 AM, Romain Manni-Bucau <
>> > > > >> >> rmannibu...@gmail.com>
>> > > > >> >> > wrote:
>> > > > >> >> >
>> > > > >> >> >> Hi guys,
>> > > > >> >> >>
>> > > > >> >> >> java/python/go/xxx support is great but as a developer you
>> > > rarely
>> > > > >> hack
>> > > > >> >> >> on them all.
>> > > > >> >> >>
>> > > > >> >> >> For that reason I opened https://github.com/apache/
>> > > beam/pull/4173
>> > > > .
>> > > > >> >> >>
>> > > > >> >> >> Goal is to give each developer a way to build the whole
>> > project
>> > > > and
>> > > > >> 

Re: [VOTE] Use Gradle for Apache Beam developmental processes

2017-11-28 Thread Jason Kuster
+1

>From the perspective of Beam's infrastructure, I've found that Gradle
provides us a good amount more flexibility to do the kinds of builds we
want. Additionally, the shorter run times (while not the only factor here)
will allow us to stretch our finite executor resources further, leading to
fewer instances where people are waiting for other builds to finish for
their presubmits to start.

On Tue, Nov 28, 2017 at 10:22 AM, Chamikara Jayalath 
wrote:

> +1
>
> And thanks Luke for clearly mentioning the migration process. Let's make
> sure that all major use cases of Maven are properly addressed before
> removing Maven support.
>
> Thanks,
> Cham
>
>
> On Tue, Nov 28, 2017 at 10:09 AM Wesley Tanaka 
> wrote:
>
>> +1
>>
>>
>> On 11/28/2017 07:55 AM, Lukasz Cwik wrote:
>>
>> This is a procedural vote for migrating to use Gradle for all our
>> development related processes (building, testing, and releasing). A
>> majority vote will signal that:
>> * Gradle build files will be supported and maintained alongside any
>> remaining Maven files.
>> * Once Gradle is able to replace Maven in a specific process (or portion
>> thereof), Maven will no longer be maintained for said process (or portion
>> thereof) and will be removed.
>>
>> +1 I support the process change
>> 0 I am indifferent to the process change
>> -1 I would like to remain with our current processes
>>
>> 
>> 
>>
>> Below is a summary of information contained in the disucssion thread
>> comparing Gradle and Maven: https://lists.apache.org/thread.html/
>> 225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe253@%
>> 3Cdev.beam.apache.org%3E
>>
>> Gradle (mins)
>> min: 25.04
>> max: 160.14
>> median: 45.78
>> average: 52.19
>> stdev: 30.80
>>
>> Maven (mins)
>> min: 56.86
>> max: 216.55 (actually > 240 mins because this data does not include
>> timeouts)
>> median: 87.93
>> average: 109.10
>> stdev: 48.01
>>
>> Maven
>> Java Support: Mature
>> Python Support: None (via mvn exec plugin)
>> Go Support: Rudimentary (via mvn plugin)
>> Protobuf Support: Rudimentary (via mvn plugin)
>> Docker Support: Rudimentary (via mvn plugin)
>> ASF Release Automation: Mature
>> Jenkins Support: Mature
>> Configuration Language: XML
>> Multiple Java Versions: Yes
>> Static Analysis Tools: Some
>> ASF Release Audit Tool (RAT): Rudimentary (plugin complete and
>> longstanding but poor)
>> IntelliJ Integration: Mature
>> Eclipse Integration: Mature
>> Extensibility: Mature (updated per JB from discuss thread)
>> Number of GitHub Projects Using It: 146k
>> Continuous build daemon: None
>> Incremental build support: None (note that this is not the same as
>> incremental compile support offered by the compiler plugin)
>> Intra-module dependencies: Rudimentary (requires the use of many profiles
>> to get per runner dependencies)
>>
>> Gradle
>> Java Support: Mature
>> Python Support: Rudimentary (pygradle, lacks pypi support)
>> Go Support: Rudimentary (gogradle plugin)
>> Protobuf Support: Rudimentary (via protobuf plugin)
>> Docker Support: Rudimentary (via docker plugin)
>> ASF Release Automation: ?
>> Jenkins Support: Mature
>> Configuration Language: Groovy
>> Multiple Java Versions: Yes
>> Static Analysis Tools: Some
>> ASF Release Audit Tool (RAT): Rudimentary (plugin just calls Apache Maven
>> ANT plugin)
>> IntelliJ Integration: Mature
>> Eclipse Integration: Mature
>> Extensibility: Mature
>> Number of GitHub Projects Using It: 122k
>> Continuous build daemon: Mature
>> Incremental build support: Mature
>> Intra-module dependencies: Mature (via configurations)
>>
>>
>> --
>> Wesley Tanakahttps://wtanaka.com/
>>
>>


-- 
---
Jason Kuster
Apache Beam / Google Cloud Dataflow


Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-28 Thread Lukasz Cwik
I would suggest that for 3.x we target portability so that more runners can
execute an Apache Beam python pipeline.

We should start targeting JIRAs which we know are backwards incompatible as
well since we know there are rough corners around some APIs.


On Tue, Nov 28, 2017 at 9:48 AM, Reuven Lax  wrote:

>
>
> On Tue, Nov 28, 2017 at 9:14 AM, Jean-Baptiste Onofré 
> wrote:
>
>> Hi Reuven,
>>
>> Yes, I remember that we agreed on a release per month. However, we didn't
>> do it before. I think the most important is not the period, it's more a
>> stable pace. I think it's more interesting for our community to have
>> "always" a release every two months, more than a tentative of a release
>> every month that end later than that. Of course, if we can do both, it's
>> perfect ;)
>>
>
> Agree. A stable pace is the most important thing.
>
>
>>
>> For Beam 3.x, I wasn't talking about breaking change, but more about
>> "marketing" announcement. I think that, even if we don't break API, some
>> features are "strong enough" to be "qualified" in a major version.
>>
>
> Ah, good point. This doesn't stop us from checking in these new features
> into 2.x possibly tagged with an @Experimental flag. We can then use 3.0 to
> announce all these features more broadly, and remove @Experimental tags.
>
> I would also like to see enterprise-ready BeamSQL and Java 7 deprecation
> on the list for Beam 3.0
>
>
>> I think that any major idea & feature (breaking or not the API) are
>> valuables for Beam 3.x (and it's a good sign for our community again ;)).
>>
>> Thanks !
>> Regards
>> JB
>>
>> On 11/28/2017 06:09 PM, Reuven Lax wrote:
>>
>>>
>>>
>>> On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré >> > wrote:
>>>
>>> Hi guys,
>>>
>>> Even if there's no rush, I think it would be great for the community
>>> to have
>>> a better view on our roadmap and where we are going in term of
>>> schedule.
>>>
>>> I would like to discuss the following:
>>> - a best effort to maintain a good release pace or at least provide
>>> a rough
>>> schedule. For instance, in Apache Karaf, I have a release schedule
>>> (http://karaf.apache.org/download.html#container-schedule
>>> ). I
>>> think a
>>> release ~ every quarter would be great.
>>>
>>>
>>> Originally we had stated that we wanted monthly releases of Beam. So far
>>> the releases have been painful enough that monthly hasn't happened. I think
>>> we should address these issues and go to monthly releases as originally
>>> stated.
>>>
>>> - if I see new Beam 2.x releases for sure (according to the previous
>>> point),
>>> it would be great to have discussion about Beam 3.x. I think that
>>> one of
>>> interesting new feature that Beam 3.x can provide is around
>>> PCollection with
>>> Schemas. It's something that we started to discuss with Reuven and
>>> Eugene.
>>> In term of schedule,
>>>
>>>
>>> I don't think schemas require Beam 3.0 - I think we can introduce them
>>> without making breaking changes. However there are many other features that
>>> would be very interesting for Beam 3.x, and we should start putting
>>> together a list of them. I
>>>
>>>
>>> I would love to see your thoughts & ideas about releases schedule
>>> and Beam 3.x.
>>>
>>> Regards
>>> JB
>>> -- Jean-Baptiste Onofré
>>> jbono...@apache.org 
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>


Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread Lukasz Cwik
Is it possible for mergebot to auto squash any fixup! and perform the merge
commit as described in (a), if so then I would vote for mergebot.

Without mergebot, I vote:
(a) 0 I like squashing fixup!
(b) -1
(c) +1 Most of our PRs are for focused singular changes which is why I
would rather squash everything over not squashing anything



On Tue, Nov 28, 2017 at 9:57 AM, Kenneth Knowles  wrote:

> On Tue, Nov 28, 2017 at 9:51 AM, Ben Chambers 
> wrote:
>
>> One risk to "squash and merge" is that it may lead to commits that don't
>> have clean descriptions -- for instance, commits like "Fixing review
>> comments" will show up. If we use (a) these would also show up as separate
>> commits. It seems like there are two cases of multiple commits in a PR:
>>
>> 1. Multiple commits in a PR that have semantic meaning (eg., a PR
>> performed N steps, split across N commits). In this case, keeping the
>> descriptions and performing either a merge (if the commits are separately
>> valid) or squash (if we want the commits to become a single commit in
>> master) probably makes sense.
>>
>
> Keep 'em
>
>
>> 2. Multiple commits in a PR that just reflect the review history. In this
>> case, we should probably ask the PR author to explicitly rebase their PR to
>> have semantically meaningful commits prior to merging. (Eg., do a rebase
>> -i).
>>
>
> Ask the author to squash 'em.
>
> Kenn
>
>
>>
>> On Tue, Nov 28, 2017 at 9:46 AM Kenneth Knowles  wrote:
>>
>>> Hi all,
>>>
>>> James brought up a great question in Slack, which was how should we use
>>> the merge button, illustrated [1]
>>>
>>> I want to broaden the discussion to talk about all the new capabilities:
>>>
>>> 1. Whether & how to use the "reviewer" field
>>> 2. Whether & how to use the "assignee" field
>>> 3. Whether & how to use the merge button
>>>
>>> My preferences are:
>>>
>>> 1. Use the reviewer field instead of "R:" comments.
>>> 2. Use the assignee field to keep track of who the review is blocked on
>>> (either the reviewer for more comments or the author for fixes)
>>> 3. Use merge commits, but editing the commit subject line
>>>
>>> To expand on part 3, GitHub's merge button has three options [1]. They
>>> are not described accurately in the UI, as they all say "merge" when only
>>> one of them performs a merge. They do the following:
>>>
>>> (a) Merge the branch with a merge commit
>>> (b) Squash all the commits, rebase and push
>>> (c) Rebase and push without squash
>>>
>>> Unlike our current guide, all of these result in a "merged" status for
>>> the PR, so we can correctly distinguish those PRs that were actually merged.
>>>
>>> My votes on these options are:
>>>
>>> (a) +1 this preserves the most information
>>> (b) -1 this erases the most information
>>> (c) -0 this is just sort of a middle ground; it breaks commit hashes,
>>> does not have a clear merge commit, but preserves other info
>>>
>>> Kenn
>>>
>>> [1] https://apachebeam.slack.com/messages/C1AAFJYMP/
>>>
>>>
>>>
>>>
>>>
>>> Kenn
>>>
>>
>


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-28 Thread Robert Bradshaw
I also did an apache github query

select count(*) as apache_projects, sum(uses_maven=true) as
uses_maven, sum(uses_gradle=true) as uses_gradle from (
select
repo_name,
max(path contains 'pom.xml') as uses_maven,
max(path contains 'gradle') as uses_gradle
from [bigquery-public-data:github_repos.files]
where instr(repo_name, 'apache/') == 1
group by repo_name);

Of 425 total apache projects on gitub, just over half (249) use maven,
and only 25 use gradle. So we'd be in the minority, but certainly not
alone.

I don't think we need to use the most common tool, rather we should
use what fits the project well, and the popularity criteria is simply
that we don't want to choose a tool where obscurity would be a
hinderance. Both gradle and maven seem to clear this bar (as do a host
of others that are even more popular, but would be unsuitable for
other reasons, e.g. plain old make).

We would certainly not switch over to gradle if we couldn't do a
release. IIRC, there's still some work to be done to push this
through, but at this point it doesn't seem like there's any reason to
expect it couldn't be done.

Is there any more data that should be gathered before a vote? (Or
should the vote perhaps have a "+/-0, need more information [please
provide details]" option.)


On Tue, Nov 28, 2017 at 9:45 AM, Scott Wegner  wrote:
> To add one more data point measuring general adoption of gradle vs. maven,
> we can look at Stackoverflow trends comparing the two tags [1]. This shows
> the percentage of new SO questions in a given month by tag. 'gradle'
> represents ~0.25% of questions, while maven is ~0.45%. So, maven is more
> dominant in the Stackoverflow community, but they are at least similar
> orders of magnitude. Also, the data is a bit noisy to draw a trendline, but
> it seems that maven's growth has flattened while gradle is still increasing.
>
> [1] https://insights.stackoverflow.com/trends?tags=maven%2Cgradle
>
> On Tue, Nov 28, 2017 at 9:14 AM Kenneth Knowles  wrote:
>>
>> Yea, I think voting is the next step. Luke - I think you are obviously the
>> right person to set up the email of what exactly we are voting on, since
>> you've driven this improvement.
>>
>> On Tue, Nov 28, 2017 at 12:08 AM, Robert Bradshaw 
>> wrote:
>>>
>>> It's great to see all the discussion going on here.
>>>
>>> I think it's important to point out that merging a parallel set of
>>> gradle build scripts is a separate (and much less disruptive) step
>>> than, say, switching over the default (or even recommended)
>>> build/release process to use them, let alone removing the maven build
>>> files entirely. The latter two should definitely be gated by a formal
>>> vote (each, probably), with the current state the gradle files can
>>> mostly be ignored by most people. In particular, this is the kind of
>>> change that needs to be in master to be evaluated--if it's on a branch
>>> we can't very well see how it impacts presubmits, and most importantly
>>> people can't try it out for real development.
>>>
>>> I agree that the choice of build tool may attract some contributors
>>> and discourage others. Having builds that are fast, correct, and
>>> reproducible will probably matter more to potential contributors than
>>> the particular command to run. While maven can surely be improved, I
>>> doubt a 2x improvement (and many more times that for incremental
>>> builds) is low-hanging fruit, and many of the issues seem quite
>>> fundamental (e.g. all the special treatment we need for NeedsRunner
>>> tests, and having to do a (global-by-default) mvn install to skip
>>> tests of dependencies when testing a leaf module).
>>>
>>> Getting data on what other apache projects use could be interesting,
>>> but unless we gather why such choices were made I don't know that it'd
>>> be that influential once we've established that both tools are widely
>>> supported generally.
>>>
>>> To re-emphasize, we'll continue to produce and publish maven
>>> artifacts, so our choice of build system won't matter for those only
>>> using Beam as a dependency.
>>>
>>>
>>>
>>> On Mon, Nov 27, 2017 at 10:48 PM, Jean-Baptiste Onofré 
>>> wrote:
>>> > Yeah, especially, I think it would have been great to have a vote
>>> > before
>>> > merging on master.
>>> >
>>> > Not a big deal, however, I'm really community focus ;)
>>> >
>>> > Regards
>>> > JB
>>> >
>>> > On 11/28/2017 07:36 AM, Reuven Lax wrote:
>>> >>
>>> >> Agreed. I thinking having a formal vote before Luke had numbers and
>>> >> results would have been too early. However now that we have such
>>> >> numbers, we
>>> >> should think about having a vote.
>>> >>
>>> >> Also, while I disagree with Romain that Gradle is not "enterprise
>>> >> ready"
>>> >> (it's heavily used by Netflix, LinkedIn, Siemens, and is the default
>>> >> build
>>> >> framework for Android apps), it would be interesting to see if any
>>> >> other ASF
>>> >> 

Re: [VOTE] Use Gradle for Apache Beam developmental processes

2017-11-28 Thread Wesley Tanaka

+1

On 11/28/2017 07:55 AM, Lukasz Cwik wrote:
This is a procedural vote for migrating to use Gradle for all our 
development related processes (building, testing, and releasing). A 
majority vote will signal that:
* Gradle build files will be supported and maintained alongside any 
remaining Maven files.
* Once Gradle is able to replace Maven in a specific process (or 
portion thereof), Maven will no longer be maintained for said process 
(or portion thereof) and will be removed.


+1 I support the process change
0 I am indifferent to the process change
-1 I would like to remain with our current processes



Below is a summary of information contained in the disucssion thread 
comparing Gradle and Maven: 
https://lists.apache.org/thread.html/225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe253@%3Cdev.beam.apache.org%3E


Gradle (mins)
min: 25.04
max: 160.14
median: 45.78
average: 52.19
stdev: 30.80

Maven (mins)
min: 56.86
max: 216.55 (actually > 240 mins because this data does not include 
timeouts)

median: 87.93
average: 109.10
stdev: 48.01

Maven
Java Support: Mature
Python Support: None (via mvn exec plugin)
Go Support: Rudimentary (via mvn plugin)
Protobuf Support: Rudimentary (via mvn plugin)
Docker Support: Rudimentary (via mvn plugin)
ASF Release Automation: Mature
Jenkins Support: Mature
Configuration Language: XML
Multiple Java Versions: Yes
Static Analysis Tools: Some
ASF Release Audit Tool (RAT): Rudimentary (plugin complete and 
longstanding but poor)

IntelliJ Integration: Mature
Eclipse Integration: Mature
Extensibility: Mature (updated per JB from discuss thread)
Number of GitHub Projects Using It: 146k
Continuous build daemon: None
Incremental build support: None (note that this is not the same as 
incremental compile support offered by the compiler plugin)
Intra-module dependencies: Rudimentary (requires the use of many 
profiles to get per runner dependencies)


Gradle
Java Support: Mature
Python Support: Rudimentary (pygradle, lacks pypi support)
Go Support: Rudimentary (gogradle plugin)
Protobuf Support: Rudimentary (via protobuf plugin)
Docker Support: Rudimentary (via docker plugin)
ASF Release Automation: ?
Jenkins Support: Mature
Configuration Language: Groovy
Multiple Java Versions: Yes
Static Analysis Tools: Some
ASF Release Audit Tool (RAT): Rudimentary (plugin just calls Apache 
Maven ANT plugin)

IntelliJ Integration: Mature
Eclipse Integration: Mature
Extensibility: Mature
Number of GitHub Projects Using It: 122k
Continuous build daemon: Mature
Incremental build support: Mature
Intra-module dependencies: Mature (via configurations)



--
Wesley Tanaka
https://wtanaka.com/



Re: [VOTE] Use Gradle for Apache Beam developmental processes

2017-11-28 Thread Thomas Groh
+1

On Tue, Nov 28, 2017 at 10:04 AM, Valentyn Tymofieiev 
wrote:

> +1 I support the process change
>
>
> On Tue, Nov 28, 2017 at 9:56 AM, Kenneth Knowles  wrote:
>
>> +1 (binding)
>>
>> On Tue, Nov 28, 2017 at 9:55 AM, Lukasz Cwik  wrote:
>>
>>> This is a procedural vote for migrating to use Gradle for all our
>>> development related processes (building, testing, and releasing). A
>>> majority vote will signal that:
>>> * Gradle build files will be supported and maintained alongside any
>>> remaining Maven files.
>>> * Once Gradle is able to replace Maven in a specific process (or portion
>>> thereof), Maven will no longer be maintained for said process (or portion
>>> thereof) and will be removed.
>>>
>>> +1 I support the process change
>>> 0 I am indifferent to the process change
>>> -1 I would like to remain with our current processes
>>>
>>> 
>>> 
>>>
>>> Below is a summary of information contained in the disucssion thread
>>> comparing Gradle and Maven: https://lists.apache.org/threa
>>> d.html/225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe
>>> 253@%3Cdev.beam.apache.org%3E
>>>
>>> Gradle (mins)
>>> min: 25.04
>>> max: 160.14
>>> median: 45.78
>>> average: 52.19
>>> stdev: 30.80
>>>
>>> Maven (mins)
>>> min: 56.86
>>> max: 216.55 (actually > 240 mins because this data does not include
>>> timeouts)
>>> median: 87.93
>>> average: 109.10
>>> stdev: 48.01
>>>
>>> Maven
>>> Java Support: Mature
>>> Python Support: None (via mvn exec plugin)
>>> Go Support: Rudimentary (via mvn plugin)
>>> Protobuf Support: Rudimentary (via mvn plugin)
>>> Docker Support: Rudimentary (via mvn plugin)
>>> ASF Release Automation: Mature
>>> Jenkins Support: Mature
>>> Configuration Language: XML
>>> Multiple Java Versions: Yes
>>> Static Analysis Tools: Some
>>> ASF Release Audit Tool (RAT): Rudimentary (plugin complete and
>>> longstanding but poor)
>>> IntelliJ Integration: Mature
>>> Eclipse Integration: Mature
>>> Extensibility: Mature (updated per JB from discuss thread)
>>> Number of GitHub Projects Using It: 146k
>>> Continuous build daemon: None
>>> Incremental build support: None (note that this is not the same as
>>> incremental compile support offered by the compiler plugin)
>>> Intra-module dependencies: Rudimentary (requires the use of many
>>> profiles to get per runner dependencies)
>>>
>>> Gradle
>>> Java Support: Mature
>>> Python Support: Rudimentary (pygradle, lacks pypi support)
>>> Go Support: Rudimentary (gogradle plugin)
>>> Protobuf Support: Rudimentary (via protobuf plugin)
>>> Docker Support: Rudimentary (via docker plugin)
>>> ASF Release Automation: ?
>>> Jenkins Support: Mature
>>> Configuration Language: Groovy
>>> Multiple Java Versions: Yes
>>> Static Analysis Tools: Some
>>> ASF Release Audit Tool (RAT): Rudimentary (plugin just calls Apache
>>> Maven ANT plugin)
>>> IntelliJ Integration: Mature
>>> Eclipse Integration: Mature
>>> Extensibility: Mature
>>> Number of GitHub Projects Using It: 122k
>>> Continuous build daemon: Mature
>>> Incremental build support: Mature
>>> Intra-module dependencies: Mature (via configurations)
>>>
>>>
>>
>


Re: [VOTE] Use Gradle for Apache Beam developmental processes

2017-11-28 Thread Valentyn Tymofieiev
+1 I support the process change


On Tue, Nov 28, 2017 at 9:56 AM, Kenneth Knowles  wrote:

> +1 (binding)
>
> On Tue, Nov 28, 2017 at 9:55 AM, Lukasz Cwik  wrote:
>
>> This is a procedural vote for migrating to use Gradle for all our
>> development related processes (building, testing, and releasing). A
>> majority vote will signal that:
>> * Gradle build files will be supported and maintained alongside any
>> remaining Maven files.
>> * Once Gradle is able to replace Maven in a specific process (or portion
>> thereof), Maven will no longer be maintained for said process (or portion
>> thereof) and will be removed.
>>
>> +1 I support the process change
>> 0 I am indifferent to the process change
>> -1 I would like to remain with our current processes
>>
>> 
>> 
>>
>> Below is a summary of information contained in the disucssion thread
>> comparing Gradle and Maven: https://lists.apache.org/threa
>> d.html/225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe
>> 253@%3Cdev.beam.apache.org%3E
>>
>> Gradle (mins)
>> min: 25.04
>> max: 160.14
>> median: 45.78
>> average: 52.19
>> stdev: 30.80
>>
>> Maven (mins)
>> min: 56.86
>> max: 216.55 (actually > 240 mins because this data does not include
>> timeouts)
>> median: 87.93
>> average: 109.10
>> stdev: 48.01
>>
>> Maven
>> Java Support: Mature
>> Python Support: None (via mvn exec plugin)
>> Go Support: Rudimentary (via mvn plugin)
>> Protobuf Support: Rudimentary (via mvn plugin)
>> Docker Support: Rudimentary (via mvn plugin)
>> ASF Release Automation: Mature
>> Jenkins Support: Mature
>> Configuration Language: XML
>> Multiple Java Versions: Yes
>> Static Analysis Tools: Some
>> ASF Release Audit Tool (RAT): Rudimentary (plugin complete and
>> longstanding but poor)
>> IntelliJ Integration: Mature
>> Eclipse Integration: Mature
>> Extensibility: Mature (updated per JB from discuss thread)
>> Number of GitHub Projects Using It: 146k
>> Continuous build daemon: None
>> Incremental build support: None (note that this is not the same as
>> incremental compile support offered by the compiler plugin)
>> Intra-module dependencies: Rudimentary (requires the use of many profiles
>> to get per runner dependencies)
>>
>> Gradle
>> Java Support: Mature
>> Python Support: Rudimentary (pygradle, lacks pypi support)
>> Go Support: Rudimentary (gogradle plugin)
>> Protobuf Support: Rudimentary (via protobuf plugin)
>> Docker Support: Rudimentary (via docker plugin)
>> ASF Release Automation: ?
>> Jenkins Support: Mature
>> Configuration Language: Groovy
>> Multiple Java Versions: Yes
>> Static Analysis Tools: Some
>> ASF Release Audit Tool (RAT): Rudimentary (plugin just calls Apache Maven
>> ANT plugin)
>> IntelliJ Integration: Mature
>> Eclipse Integration: Mature
>> Extensibility: Mature
>> Number of GitHub Projects Using It: 122k
>> Continuous build daemon: Mature
>> Incremental build support: Mature
>> Intra-module dependencies: Mature (via configurations)
>>
>>
>


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-28 Thread Lukasz Cwik
Romain, Gradle has a Nexus plugin[1] which can sign and publish artifacts.
Gradle also has excellent support to run Ant tasks since Ant can perform
the entire release process for an ASF project.

1: https://github.com/bmuschko/gradle-nexus-plugin

On Tue, Nov 28, 2017 at 9:45 AM, Scott Wegner  wrote:

> To add one more data point measuring general adoption of gradle vs. maven,
> we can look at Stackoverflow trends comparing the two tags [1]. This shows
> the percentage of new SO questions in a given month by tag. 'gradle'
> represents ~0.25% of questions, while maven is ~0.45%. So, maven is more
> dominant in the Stackoverflow community, but they are at least similar
> orders of magnitude. Also, the data is a bit noisy to draw a trendline, but
> it seems that maven's growth has flattened while gradle is still increasing.
>
> [1] https://insights.stackoverflow.com/trends?tags=maven%2Cgradle
>
> On Tue, Nov 28, 2017 at 9:14 AM Kenneth Knowles  wrote:
>
>> Yea, I think voting is the next step. Luke - I think you are obviously
>> the right person to set up the email of what exactly we are voting on,
>> since you've driven this improvement.
>>
>> On Tue, Nov 28, 2017 at 12:08 AM, Robert Bradshaw 
>> wrote:
>>
>>> It's great to see all the discussion going on here.
>>>
>>> I think it's important to point out that merging a parallel set of
>>> gradle build scripts is a separate (and much less disruptive) step
>>> than, say, switching over the default (or even recommended)
>>> build/release process to use them, let alone removing the maven build
>>> files entirely. The latter two should definitely be gated by a formal
>>> vote (each, probably), with the current state the gradle files can
>>> mostly be ignored by most people. In particular, this is the kind of
>>> change that needs to be in master to be evaluated--if it's on a branch
>>> we can't very well see how it impacts presubmits, and most importantly
>>> people can't try it out for real development.
>>>
>>> I agree that the choice of build tool may attract some contributors
>>> and discourage others. Having builds that are fast, correct, and
>>> reproducible will probably matter more to potential contributors than
>>> the particular command to run. While maven can surely be improved, I
>>> doubt a 2x improvement (and many more times that for incremental
>>> builds) is low-hanging fruit, and many of the issues seem quite
>>> fundamental (e.g. all the special treatment we need for NeedsRunner
>>> tests, and having to do a (global-by-default) mvn install to skip
>>> tests of dependencies when testing a leaf module).
>>>
>>> Getting data on what other apache projects use could be interesting,
>>> but unless we gather why such choices were made I don't know that it'd
>>> be that influential once we've established that both tools are widely
>>> supported generally.
>>>
>>> To re-emphasize, we'll continue to produce and publish maven
>>> artifacts, so our choice of build system won't matter for those only
>>> using Beam as a dependency.
>>>
>>>
>>>
>>> On Mon, Nov 27, 2017 at 10:48 PM, Jean-Baptiste Onofré 
>>> wrote:
>>> > Yeah, especially, I think it would have been great to have a vote
>>> before
>>> > merging on master.
>>> >
>>> > Not a big deal, however, I'm really community focus ;)
>>> >
>>> > Regards
>>> > JB
>>> >
>>> > On 11/28/2017 07:36 AM, Reuven Lax wrote:
>>> >>
>>> >> Agreed. I thinking having a formal vote before Luke had numbers and
>>> >> results would have been too early. However now that we have such
>>> numbers, we
>>> >> should think about having a vote.
>>> >>
>>> >> Also, while I disagree with Romain that Gradle is not "enterprise
>>> ready"
>>> >> (it's heavily used by Netflix, LinkedIn, Siemens, and is the default
>>> build
>>> >> framework for Android apps), it would be interesting to see if any
>>> other ASF
>>> >> projects are using it. I don't think that should not make or break the
>>> >> decision - we should do what's best for the Beam project, and
>>> "everyone else
>>> >> is doing something" is rarely a good argument - it will provide good
>>> data
>>> >> points for us to evaluate.
>>> >>
>>> >> Reuven
>>> >>
>>> >> On Mon, Nov 27, 2017 at 10:23 PM, Jean-Baptiste Onofré <
>>> j...@nanthrax.net
>>> >> > wrote:
>>> >>
>>> >> Hi Luke,
>>> >>
>>> >> just curious (and maybe I missed it): did we do a formal vote to
>>> merge
>>> >> the
>>> >> gradle build ?
>>> >> Gradle is now on master, we have some Jira to update the release
>>> guide
>>> >> with
>>> >> gradle. It's fine, but I remember only a discussion, not a vote.
>>> >>
>>> >> In order to embrace the community and avoid to have some
>>> contributors
>>> >> "frustrated" (meaning that "this project doesn't care about
>>> >> contributor,
>>> >> they just do whatever they want"), I would have love to see a
>>> formal
>>> >> vote
>>> >> about 

Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread Kenneth Knowles
On Tue, Nov 28, 2017 at 9:51 AM, Ben Chambers  wrote:

> One risk to "squash and merge" is that it may lead to commits that don't
> have clean descriptions -- for instance, commits like "Fixing review
> comments" will show up. If we use (a) these would also show up as separate
> commits. It seems like there are two cases of multiple commits in a PR:
>
> 1. Multiple commits in a PR that have semantic meaning (eg., a PR
> performed N steps, split across N commits). In this case, keeping the
> descriptions and performing either a merge (if the commits are separately
> valid) or squash (if we want the commits to become a single commit in
> master) probably makes sense.
>

Keep 'em


> 2. Multiple commits in a PR that just reflect the review history. In this
> case, we should probably ask the PR author to explicitly rebase their PR to
> have semantically meaningful commits prior to merging. (Eg., do a rebase
> -i).
>

Ask the author to squash 'em.

Kenn


>
> On Tue, Nov 28, 2017 at 9:46 AM Kenneth Knowles  wrote:
>
>> Hi all,
>>
>> James brought up a great question in Slack, which was how should we use
>> the merge button, illustrated [1]
>>
>> I want to broaden the discussion to talk about all the new capabilities:
>>
>> 1. Whether & how to use the "reviewer" field
>> 2. Whether & how to use the "assignee" field
>> 3. Whether & how to use the merge button
>>
>> My preferences are:
>>
>> 1. Use the reviewer field instead of "R:" comments.
>> 2. Use the assignee field to keep track of who the review is blocked on
>> (either the reviewer for more comments or the author for fixes)
>> 3. Use merge commits, but editing the commit subject line
>>
>> To expand on part 3, GitHub's merge button has three options [1]. They
>> are not described accurately in the UI, as they all say "merge" when only
>> one of them performs a merge. They do the following:
>>
>> (a) Merge the branch with a merge commit
>> (b) Squash all the commits, rebase and push
>> (c) Rebase and push without squash
>>
>> Unlike our current guide, all of these result in a "merged" status for
>> the PR, so we can correctly distinguish those PRs that were actually merged.
>>
>> My votes on these options are:
>>
>> (a) +1 this preserves the most information
>> (b) -1 this erases the most information
>> (c) -0 this is just sort of a middle ground; it breaks commit hashes,
>> does not have a clear merge commit, but preserves other info
>>
>> Kenn
>>
>> [1] https://apachebeam.slack.com/messages/C1AAFJYMP/
>>
>>
>>
>>
>>
>> Kenn
>>
>


Re: [VOTE] Use Gradle for Apache Beam developmental processes

2017-11-28 Thread Kenneth Knowles
+1 (binding)

On Tue, Nov 28, 2017 at 9:55 AM, Lukasz Cwik  wrote:

> This is a procedural vote for migrating to use Gradle for all our
> development related processes (building, testing, and releasing). A
> majority vote will signal that:
> * Gradle build files will be supported and maintained alongside any
> remaining Maven files.
> * Once Gradle is able to replace Maven in a specific process (or portion
> thereof), Maven will no longer be maintained for said process (or portion
> thereof) and will be removed.
>
> +1 I support the process change
> 0 I am indifferent to the process change
> -1 I would like to remain with our current processes
>
> 
> 
>
> Below is a summary of information contained in the disucssion thread
> comparing Gradle and Maven: https://lists.apache.org/thread.html/
> 225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe253@%
> 3Cdev.beam.apache.org%3E
>
> Gradle (mins)
> min: 25.04
> max: 160.14
> median: 45.78
> average: 52.19
> stdev: 30.80
>
> Maven (mins)
> min: 56.86
> max: 216.55 (actually > 240 mins because this data does not include
> timeouts)
> median: 87.93
> average: 109.10
> stdev: 48.01
>
> Maven
> Java Support: Mature
> Python Support: None (via mvn exec plugin)
> Go Support: Rudimentary (via mvn plugin)
> Protobuf Support: Rudimentary (via mvn plugin)
> Docker Support: Rudimentary (via mvn plugin)
> ASF Release Automation: Mature
> Jenkins Support: Mature
> Configuration Language: XML
> Multiple Java Versions: Yes
> Static Analysis Tools: Some
> ASF Release Audit Tool (RAT): Rudimentary (plugin complete and
> longstanding but poor)
> IntelliJ Integration: Mature
> Eclipse Integration: Mature
> Extensibility: Mature (updated per JB from discuss thread)
> Number of GitHub Projects Using It: 146k
> Continuous build daemon: None
> Incremental build support: None (note that this is not the same as
> incremental compile support offered by the compiler plugin)
> Intra-module dependencies: Rudimentary (requires the use of many profiles
> to get per runner dependencies)
>
> Gradle
> Java Support: Mature
> Python Support: Rudimentary (pygradle, lacks pypi support)
> Go Support: Rudimentary (gogradle plugin)
> Protobuf Support: Rudimentary (via protobuf plugin)
> Docker Support: Rudimentary (via docker plugin)
> ASF Release Automation: ?
> Jenkins Support: Mature
> Configuration Language: Groovy
> Multiple Java Versions: Yes
> Static Analysis Tools: Some
> ASF Release Audit Tool (RAT): Rudimentary (plugin just calls Apache Maven
> ANT plugin)
> IntelliJ Integration: Mature
> Eclipse Integration: Mature
> Extensibility: Mature
> Number of GitHub Projects Using It: 122k
> Continuous build daemon: Mature
> Incremental build support: Mature
> Intra-module dependencies: Mature (via configurations)
>
>


[VOTE] Use Gradle for Apache Beam developmental processes

2017-11-28 Thread Lukasz Cwik
This is a procedural vote for migrating to use Gradle for all our
development related processes (building, testing, and releasing). A
majority vote will signal that:
* Gradle build files will be supported and maintained alongside any
remaining Maven files.
* Once Gradle is able to replace Maven in a specific process (or portion
thereof), Maven will no longer be maintained for said process (or portion
thereof) and will be removed.

+1 I support the process change
0 I am indifferent to the process change
-1 I would like to remain with our current processes



Below is a summary of information contained in the disucssion thread
comparing Gradle and Maven:
https://lists.apache.org/thread.html/225dddcfc78f39bbb296a0d2bbef1caf37e17677c7e5573f0b6fe253@%3Cdev.beam.apache.org%3E

Gradle (mins)
min: 25.04
max: 160.14
median: 45.78
average: 52.19
stdev: 30.80

Maven (mins)
min: 56.86
max: 216.55 (actually > 240 mins because this data does not include
timeouts)
median: 87.93
average: 109.10
stdev: 48.01

Maven
Java Support: Mature
Python Support: None (via mvn exec plugin)
Go Support: Rudimentary (via mvn plugin)
Protobuf Support: Rudimentary (via mvn plugin)
Docker Support: Rudimentary (via mvn plugin)
ASF Release Automation: Mature
Jenkins Support: Mature
Configuration Language: XML
Multiple Java Versions: Yes
Static Analysis Tools: Some
ASF Release Audit Tool (RAT): Rudimentary (plugin complete and longstanding
but poor)
IntelliJ Integration: Mature
Eclipse Integration: Mature
Extensibility: Mature (updated per JB from discuss thread)
Number of GitHub Projects Using It: 146k
Continuous build daemon: None
Incremental build support: None (note that this is not the same as
incremental compile support offered by the compiler plugin)
Intra-module dependencies: Rudimentary (requires the use of many profiles
to get per runner dependencies)

Gradle
Java Support: Mature
Python Support: Rudimentary (pygradle, lacks pypi support)
Go Support: Rudimentary (gogradle plugin)
Protobuf Support: Rudimentary (via protobuf plugin)
Docker Support: Rudimentary (via docker plugin)
ASF Release Automation: ?
Jenkins Support: Mature
Configuration Language: Groovy
Multiple Java Versions: Yes
Static Analysis Tools: Some
ASF Release Audit Tool (RAT): Rudimentary (plugin just calls Apache Maven
ANT plugin)
IntelliJ Integration: Mature
Eclipse Integration: Mature
Extensibility: Mature
Number of GitHub Projects Using It: 122k
Continuous build daemon: Mature
Incremental build support: Mature
Intra-module dependencies: Mature (via configurations)


Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread Thomas Groh
I am strongly in favor of (1); I have no strong feelings about (2); I agree
on (3), but generically am not hugely concerned, so long as back-references
to the original PR are maintained, which is where most of the context
lives. It is nice to have the change broken up into as many individually
useful parts as possible, so I wouldn't really choose (b) or (c).

Of note, (1) will not be possible if you would like another contributor to
review and they have not set up their gitbox account. Notably this is
always going to be the case for contributors who are not committers - we
should maintain use of the "R: @reviewer" comments in those cases.

On Tue, Nov 28, 2017 at 9:45 AM, Kenneth Knowles  wrote:

> Hi all,
>
> James brought up a great question in Slack, which was how should we use
> the merge button, illustrated [1]
>
> I want to broaden the discussion to talk about all the new capabilities:
>
> 1. Whether & how to use the "reviewer" field
> 2. Whether & how to use the "assignee" field
> 3. Whether & how to use the merge button
>
> My preferences are:
>
> 1. Use the reviewer field instead of "R:" comments.
> 2. Use the assignee field to keep track of who the review is blocked on
> (either the reviewer for more comments or the author for fixes)
> 3. Use merge commits, but editing the commit subject line
>
> To expand on part 3, GitHub's merge button has three options [1]. They are
> not described accurately in the UI, as they all say "merge" when only one
> of them performs a merge. They do the following:
>
> (a) Merge the branch with a merge commit
> (b) Squash all the commits, rebase and push
> (c) Rebase and push without squash
>
> Unlike our current guide, all of these result in a "merged" status for the
> PR, so we can correctly distinguish those PRs that were actually merged.
>
> My votes on these options are:
>
> (a) +1 this preserves the most information
> (b) -1 this erases the most information
> (c) -0 this is just sort of a middle ground; it breaks commit hashes, does
> not have a clear merge commit, but preserves other info
>
> Kenn
>
> [1] https://apachebeam.slack.com/messages/C1AAFJYMP/
>
>
>
>
>
> Kenn
>


Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread Ben Chambers
One risk to "squash and merge" is that it may lead to commits that don't
have clean descriptions -- for instance, commits like "Fixing review
comments" will show up. If we use (a) these would also show up as separate
commits. It seems like there are two cases of multiple commits in a PR:

1. Multiple commits in a PR that have semantic meaning (eg., a PR performed
N steps, split across N commits). In this case, keeping the descriptions
and performing either a merge (if the commits are separately valid) or
squash (if we want the commits to become a single commit in master)
probably makes sense.
2. Multiple commits in a PR that just reflect the review history. In this
case, we should probably ask the PR author to explicitly rebase their PR to
have semantically meaningful commits prior to merging. (Eg., do a rebase
-i).

On Tue, Nov 28, 2017 at 9:46 AM Kenneth Knowles  wrote:

> Hi all,
>
> James brought up a great question in Slack, which was how should we use
> the merge button, illustrated [1]
>
> I want to broaden the discussion to talk about all the new capabilities:
>
> 1. Whether & how to use the "reviewer" field
> 2. Whether & how to use the "assignee" field
> 3. Whether & how to use the merge button
>
> My preferences are:
>
> 1. Use the reviewer field instead of "R:" comments.
> 2. Use the assignee field to keep track of who the review is blocked on
> (either the reviewer for more comments or the author for fixes)
> 3. Use merge commits, but editing the commit subject line
>
> To expand on part 3, GitHub's merge button has three options [1]. They are
> not described accurately in the UI, as they all say "merge" when only one
> of them performs a merge. They do the following:
>
> (a) Merge the branch with a merge commit
> (b) Squash all the commits, rebase and push
> (c) Rebase and push without squash
>
> Unlike our current guide, all of these result in a "merged" status for the
> PR, so we can correctly distinguish those PRs that were actually merged.
>
> My votes on these options are:
>
> (a) +1 this preserves the most information
> (b) -1 this erases the most information
> (c) -0 this is just sort of a middle ground; it breaks commit hashes, does
> not have a clear merge commit, but preserves other info
>
> Kenn
>
> [1] https://apachebeam.slack.com/messages/C1AAFJYMP/
>
>
>
>
>
> Kenn
>


Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-28 Thread Reuven Lax
On Tue, Nov 28, 2017 at 9:14 AM, Jean-Baptiste Onofré 
wrote:

> Hi Reuven,
>
> Yes, I remember that we agreed on a release per month. However, we didn't
> do it before. I think the most important is not the period, it's more a
> stable pace. I think it's more interesting for our community to have
> "always" a release every two months, more than a tentative of a release
> every month that end later than that. Of course, if we can do both, it's
> perfect ;)
>

Agree. A stable pace is the most important thing.


>
> For Beam 3.x, I wasn't talking about breaking change, but more about
> "marketing" announcement. I think that, even if we don't break API, some
> features are "strong enough" to be "qualified" in a major version.
>

Ah, good point. This doesn't stop us from checking in these new features
into 2.x possibly tagged with an @Experimental flag. We can then use 3.0 to
announce all these features more broadly, and remove @Experimental tags.

I would also like to see enterprise-ready BeamSQL and Java 7 deprecation on
the list for Beam 3.0


> I think that any major idea & feature (breaking or not the API) are
> valuables for Beam 3.x (and it's a good sign for our community again ;)).
>
> Thanks !
> Regards
> JB
>
> On 11/28/2017 06:09 PM, Reuven Lax wrote:
>
>>
>>
>> On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré > > wrote:
>>
>> Hi guys,
>>
>> Even if there's no rush, I think it would be great for the community
>> to have
>> a better view on our roadmap and where we are going in term of
>> schedule.
>>
>> I would like to discuss the following:
>> - a best effort to maintain a good release pace or at least provide a
>> rough
>> schedule. For instance, in Apache Karaf, I have a release schedule
>> (http://karaf.apache.org/download.html#container-schedule
>> ). I think
>> a
>> release ~ every quarter would be great.
>>
>>
>> Originally we had stated that we wanted monthly releases of Beam. So far
>> the releases have been painful enough that monthly hasn't happened. I think
>> we should address these issues and go to monthly releases as originally
>> stated.
>>
>> - if I see new Beam 2.x releases for sure (according to the previous
>> point),
>> it would be great to have discussion about Beam 3.x. I think that one
>> of
>> interesting new feature that Beam 3.x can provide is around
>> PCollection with
>> Schemas. It's something that we started to discuss with Reuven and
>> Eugene.
>> In term of schedule,
>>
>>
>> I don't think schemas require Beam 3.0 - I think we can introduce them
>> without making breaking changes. However there are many other features that
>> would be very interesting for Beam 3.x, and we should start putting
>> together a list of them. I
>>
>>
>> I would love to see your thoughts & ideas about releases schedule and
>> Beam 3.x.
>>
>> Regards
>> JB
>> -- Jean-Baptiste Onofré
>> jbono...@apache.org 
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread Jean-Baptiste Onofré

Hi,

In other Apache projects using gitbox, I experiment, the following cinematic:

1. use the review button to assign someone
2. once changes approved, I use the merge button (supporting squash and merge)

It's very convenient and works fine.

So, +1 to (b)

Regards
JB

On 11/28/2017 06:45 PM, Kenneth Knowles wrote:

Hi all,

James brought up a great question in Slack, which was how should we use the 
merge button, illustrated [1]


I want to broaden the discussion to talk about all the new capabilities:

1. Whether & how to use the "reviewer" field
2. Whether & how to use the "assignee" field
3. Whether & how to use the merge button

My preferences are:

1. Use the reviewer field instead of "R:" comments.
2. Use the assignee field to keep track of who the review is blocked on (either 
the reviewer for more comments or the author for fixes)

3. Use merge commits, but editing the commit subject line

To expand on part 3, GitHub's merge button has three options [1]. They are not 
described accurately in the UI, as they all say "merge" when only one of them 
performs a merge. They do the following:


(a) Merge the branch with a merge commit
(b) Squash all the commits, rebase and push
(c) Rebase and push without squash

Unlike our current guide, all of these result in a "merged" status for the PR, 
so we can correctly distinguish those PRs that were actually merged.


My votes on these options are:

(a) +1 this preserves the most information
(b) -1 this erases the most information
(c) -0 this is just sort of a middle ground; it breaks commit hashes, does not 
have a clear merge commit, but preserves other info


Kenn

[1] https://apachebeam.slack.com/messages/C1AAFJYMP/





Kenn


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


[DISCUSS] Updating contribution guide for gitbox

2017-11-28 Thread Kenneth Knowles
Hi all,

James brought up a great question in Slack, which was how should we use the
merge button, illustrated [1]

I want to broaden the discussion to talk about all the new capabilities:

1. Whether & how to use the "reviewer" field
2. Whether & how to use the "assignee" field
3. Whether & how to use the merge button

My preferences are:

1. Use the reviewer field instead of "R:" comments.
2. Use the assignee field to keep track of who the review is blocked on
(either the reviewer for more comments or the author for fixes)
3. Use merge commits, but editing the commit subject line

To expand on part 3, GitHub's merge button has three options [1]. They are
not described accurately in the UI, as they all say "merge" when only one
of them performs a merge. They do the following:

(a) Merge the branch with a merge commit
(b) Squash all the commits, rebase and push
(c) Rebase and push without squash

Unlike our current guide, all of these result in a "merged" status for the
PR, so we can correctly distinguish those PRs that were actually merged.

My votes on these options are:

(a) +1 this preserves the most information
(b) -1 this erases the most information
(c) -0 this is just sort of a middle ground; it breaks commit hashes, does
not have a clear merge commit, but preserves other info

Kenn

[1] https://apachebeam.slack.com/messages/C1AAFJYMP/





Kenn


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-28 Thread Scott Wegner
To add one more data point measuring general adoption of gradle vs. maven,
we can look at Stackoverflow trends comparing the two tags [1]. This shows
the percentage of new SO questions in a given month by tag. 'gradle'
represents ~0.25% of questions, while maven is ~0.45%. So, maven is more
dominant in the Stackoverflow community, but they are at least similar
orders of magnitude. Also, the data is a bit noisy to draw a trendline, but
it seems that maven's growth has flattened while gradle is still increasing.

[1] https://insights.stackoverflow.com/trends?tags=maven%2Cgradle

On Tue, Nov 28, 2017 at 9:14 AM Kenneth Knowles  wrote:

> Yea, I think voting is the next step. Luke - I think you are obviously the
> right person to set up the email of what exactly we are voting on, since
> you've driven this improvement.
>
> On Tue, Nov 28, 2017 at 12:08 AM, Robert Bradshaw 
> wrote:
>
>> It's great to see all the discussion going on here.
>>
>> I think it's important to point out that merging a parallel set of
>> gradle build scripts is a separate (and much less disruptive) step
>> than, say, switching over the default (or even recommended)
>> build/release process to use them, let alone removing the maven build
>> files entirely. The latter two should definitely be gated by a formal
>> vote (each, probably), with the current state the gradle files can
>> mostly be ignored by most people. In particular, this is the kind of
>> change that needs to be in master to be evaluated--if it's on a branch
>> we can't very well see how it impacts presubmits, and most importantly
>> people can't try it out for real development.
>>
>> I agree that the choice of build tool may attract some contributors
>> and discourage others. Having builds that are fast, correct, and
>> reproducible will probably matter more to potential contributors than
>> the particular command to run. While maven can surely be improved, I
>> doubt a 2x improvement (and many more times that for incremental
>> builds) is low-hanging fruit, and many of the issues seem quite
>> fundamental (e.g. all the special treatment we need for NeedsRunner
>> tests, and having to do a (global-by-default) mvn install to skip
>> tests of dependencies when testing a leaf module).
>>
>> Getting data on what other apache projects use could be interesting,
>> but unless we gather why such choices were made I don't know that it'd
>> be that influential once we've established that both tools are widely
>> supported generally.
>>
>> To re-emphasize, we'll continue to produce and publish maven
>> artifacts, so our choice of build system won't matter for those only
>> using Beam as a dependency.
>>
>>
>>
>> On Mon, Nov 27, 2017 at 10:48 PM, Jean-Baptiste Onofré 
>> wrote:
>> > Yeah, especially, I think it would have been great to have a vote before
>> > merging on master.
>> >
>> > Not a big deal, however, I'm really community focus ;)
>> >
>> > Regards
>> > JB
>> >
>> > On 11/28/2017 07:36 AM, Reuven Lax wrote:
>> >>
>> >> Agreed. I thinking having a formal vote before Luke had numbers and
>> >> results would have been too early. However now that we have such
>> numbers, we
>> >> should think about having a vote.
>> >>
>> >> Also, while I disagree with Romain that Gradle is not "enterprise
>> ready"
>> >> (it's heavily used by Netflix, LinkedIn, Siemens, and is the default
>> build
>> >> framework for Android apps), it would be interesting to see if any
>> other ASF
>> >> projects are using it. I don't think that should not make or break the
>> >> decision - we should do what's best for the Beam project, and
>> "everyone else
>> >> is doing something" is rarely a good argument - it will provide good
>> data
>> >> points for us to evaluate.
>> >>
>> >> Reuven
>> >>
>> >> On Mon, Nov 27, 2017 at 10:23 PM, Jean-Baptiste Onofré <
>> j...@nanthrax.net
>> >> > wrote:
>> >>
>> >> Hi Luke,
>> >>
>> >> just curious (and maybe I missed it): did we do a formal vote to
>> merge
>> >> the
>> >> gradle build ?
>> >> Gradle is now on master, we have some Jira to update the release
>> guide
>> >> with
>> >> gradle. It's fine, but I remember only a discussion, not a vote.
>> >>
>> >> In order to embrace the community and avoid to have some
>> contributors
>> >> "frustrated" (meaning that "this project doesn't care about
>> >> contributor,
>> >> they just do whatever they want"), I would have love to see a
>> formal
>> >> vote
>> >> about Gradle more than just a discussion.
>> >>
>> >> My $0.01
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> On 11/27/2017 07:46 PM, Lukasz Cwik wrote:
>> >>
>> >> I have collected data by running several builds against master
>> >> using Gradle
>> >> and Maven without using Gradle's support for incremental
>> builds.
>> >>
>> >> Gradle (mins)
>> >> min: 25.04
>> >> max: 160.14
>> >> median: 

Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-28 Thread Romain Manni-Bucau
Did you try a release (you can create a temporary staging repo on ASF
nexus if it helps) before starting a vote? Cause you migrate and the
project is no more able to release it can be a rude blocker - which
never happens when needed ;). Release has a few more plugins I didn't
find in gradle (can have missed them) like the signing of artifacts
etc.

Romain Manni-Bucau
@rmannibucau |  Blog | Old Blog | Github | LinkedIn


2017-11-28 18:14 GMT+01:00 Kenneth Knowles :
> Yea, I think voting is the next step. Luke - I think you are obviously the
> right person to set up the email of what exactly we are voting on, since
> you've driven this improvement.
>
> On Tue, Nov 28, 2017 at 12:08 AM, Robert Bradshaw 
> wrote:
>>
>> It's great to see all the discussion going on here.
>>
>> I think it's important to point out that merging a parallel set of
>> gradle build scripts is a separate (and much less disruptive) step
>> than, say, switching over the default (or even recommended)
>> build/release process to use them, let alone removing the maven build
>> files entirely. The latter two should definitely be gated by a formal
>> vote (each, probably), with the current state the gradle files can
>> mostly be ignored by most people. In particular, this is the kind of
>> change that needs to be in master to be evaluated--if it's on a branch
>> we can't very well see how it impacts presubmits, and most importantly
>> people can't try it out for real development.
>>
>> I agree that the choice of build tool may attract some contributors
>> and discourage others. Having builds that are fast, correct, and
>> reproducible will probably matter more to potential contributors than
>> the particular command to run. While maven can surely be improved, I
>> doubt a 2x improvement (and many more times that for incremental
>> builds) is low-hanging fruit, and many of the issues seem quite
>> fundamental (e.g. all the special treatment we need for NeedsRunner
>> tests, and having to do a (global-by-default) mvn install to skip
>> tests of dependencies when testing a leaf module).
>>
>> Getting data on what other apache projects use could be interesting,
>> but unless we gather why such choices were made I don't know that it'd
>> be that influential once we've established that both tools are widely
>> supported generally.
>>
>> To re-emphasize, we'll continue to produce and publish maven
>> artifacts, so our choice of build system won't matter for those only
>> using Beam as a dependency.
>>
>>
>>
>> On Mon, Nov 27, 2017 at 10:48 PM, Jean-Baptiste Onofré 
>> wrote:
>> > Yeah, especially, I think it would have been great to have a vote before
>> > merging on master.
>> >
>> > Not a big deal, however, I'm really community focus ;)
>> >
>> > Regards
>> > JB
>> >
>> > On 11/28/2017 07:36 AM, Reuven Lax wrote:
>> >>
>> >> Agreed. I thinking having a formal vote before Luke had numbers and
>> >> results would have been too early. However now that we have such
>> >> numbers, we
>> >> should think about having a vote.
>> >>
>> >> Also, while I disagree with Romain that Gradle is not "enterprise
>> >> ready"
>> >> (it's heavily used by Netflix, LinkedIn, Siemens, and is the default
>> >> build
>> >> framework for Android apps), it would be interesting to see if any
>> >> other ASF
>> >> projects are using it. I don't think that should not make or break the
>> >> decision - we should do what's best for the Beam project, and "everyone
>> >> else
>> >> is doing something" is rarely a good argument - it will provide good
>> >> data
>> >> points for us to evaluate.
>> >>
>> >> Reuven
>> >>
>> >> On Mon, Nov 27, 2017 at 10:23 PM, Jean-Baptiste Onofré > >> > wrote:
>> >>
>> >> Hi Luke,
>> >>
>> >> just curious (and maybe I missed it): did we do a formal vote to
>> >> merge
>> >> the
>> >> gradle build ?
>> >> Gradle is now on master, we have some Jira to update the release
>> >> guide
>> >> with
>> >> gradle. It's fine, but I remember only a discussion, not a vote.
>> >>
>> >> In order to embrace the community and avoid to have some
>> >> contributors
>> >> "frustrated" (meaning that "this project doesn't care about
>> >> contributor,
>> >> they just do whatever they want"), I would have love to see a
>> >> formal
>> >> vote
>> >> about Gradle more than just a discussion.
>> >>
>> >> My $0.01
>> >>
>> >> Regards
>> >> JB
>> >>
>> >> On 11/27/2017 07:46 PM, Lukasz Cwik wrote:
>> >>
>> >> I have collected data by running several builds against master
>> >> using Gradle
>> >> and Maven without using Gradle's support for incremental
>> >> builds.
>> >>
>> >> Gradle (mins)
>> >> min: 25.04
>> >> max: 160.14
>> >> median: 45.78
>> >> average: 52.19
>> >> stdev: 30.80
>> >>
>> >> Maven (mins)
>> >> min: 56.86
>> >> max: 

Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-28 Thread Jean-Baptiste Onofré

+1 for monthly release if we can sustain this pace ;)

Fully agree to improve the test, automation, documentation of the release 
process.

On 11/28/2017 06:25 PM, Kenneth Knowles wrote:
Yea, let's work hard on improving the ease and pace of releases. I am not really 
happy to have only quarterly releases.


Automation of release process where possible, better test coverage, a higher 
resistance to cherry-picks.


Kenn

On Tue, Nov 28, 2017 at 9:14 AM, Jean-Baptiste Onofré > wrote:


Hi Reuven,

Yes, I remember that we agreed on a release per month. However, we didn't do
it before. I think the most important is not the period, it's more a stable
pace. I think it's more interesting for our community to have "always" a
release every two months, more than a tentative of a release every month
that end later than that. Of course, if we can do both, it's perfect ;)

For Beam 3.x, I wasn't talking about breaking change, but more about
"marketing" announcement. I think that, even if we don't break API, some
features are "strong enough" to be "qualified" in a major version.

I think that any major idea & feature (breaking or not the API) are
valuables for Beam 3.x (and it's a good sign for our community again ;)).

Thanks !
Regards
JB

On 11/28/2017 06:09 PM, Reuven Lax wrote:



On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré  >> wrote:

     Hi guys,

     Even if there's no rush, I think it would be great for the
community to have
     a better view on our roadmap and where we are going in term of
schedule.

     I would like to discuss the following:
     - a best effort to maintain a good release pace or at least provide
a rough
     schedule. For instance, in Apache Karaf, I have a release schedule
     (http://karaf.apache.org/download.html#container-schedule

     >). I think a
     release ~ every quarter would be great.


Originally we had stated that we wanted monthly releases of Beam. So far
the releases have been painful enough that monthly hasn't happened. I
think we should address these issues and go to monthly releases as
originally stated.

     - if I see new Beam 2.x releases for sure (according to the
previous point),
     it would be great to have discussion about Beam 3.x. I think that
one of
     interesting new feature that Beam 3.x can provide is around
PCollection with
     Schemas. It's something that we started to discuss with Reuven and
Eugene.
     In term of schedule,


I don't think schemas require Beam 3.0 - I think we can introduce them
without making breaking changes. However there are many other features
that would be very interesting for Beam 3.x, and we should start putting
together a list of them.


     I would love to see your thoughts & ideas about releases schedule
and Beam 3.x.

     Regards
     JB
     --     Jean-Baptiste Onofré
jbono...@apache.org 
>
http://blog.nanthrax.net
     Talend - http://www.talend.com



-- 
Jean-Baptiste Onofré

jbono...@apache.org 
http://blog.nanthrax.net
Talend - http://www.talend.com




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-28 Thread Ben Chambers
Strong +1 to both increasing the frequency of minor releases and also
putting together a road map for the next major release or two.

I think it would be great to communicate to the community the direction
Beam is taking in the future -- what things will users be able to do with
3.0 or 4.0 that they can't do with 2.x?

On Tue, Nov 28, 2017 at 9:25 AM Kenneth Knowles  wrote:

> Yea, let's work hard on improving the ease and pace of releases. I am not
> really happy to have only quarterly releases.
>
> Automation of release process where possible, better test coverage, a
> higher resistance to cherry-picks.
>
> Kenn
>
> On Tue, Nov 28, 2017 at 9:14 AM, Jean-Baptiste Onofré 
> wrote:
>
>> Hi Reuven,
>>
>> Yes, I remember that we agreed on a release per month. However, we didn't
>> do it before. I think the most important is not the period, it's more a
>> stable pace. I think it's more interesting for our community to have
>> "always" a release every two months, more than a tentative of a release
>> every month that end later than that. Of course, if we can do both, it's
>> perfect ;)
>>
>> For Beam 3.x, I wasn't talking about breaking change, but more about
>> "marketing" announcement. I think that, even if we don't break API, some
>> features are "strong enough" to be "qualified" in a major version.
>>
>> I think that any major idea & feature (breaking or not the API) are
>> valuables for Beam 3.x (and it's a good sign for our community again ;)).
>>
>> Thanks !
>> Regards
>> JB
>>
>> On 11/28/2017 06:09 PM, Reuven Lax wrote:
>>
>>>
>>>
>>> On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré >> > wrote:
>>>
>>> Hi guys,
>>>
>>> Even if there's no rush, I think it would be great for the community
>>> to have
>>> a better view on our roadmap and where we are going in term of
>>> schedule.
>>>
>>> I would like to discuss the following:
>>> - a best effort to maintain a good release pace or at least provide
>>> a rough
>>> schedule. For instance, in Apache Karaf, I have a release schedule
>>> (http://karaf.apache.org/download.html#container-schedule
>>> ). I
>>> think a
>>> release ~ every quarter would be great.
>>>
>>>
>>> Originally we had stated that we wanted monthly releases of Beam. So far
>>> the releases have been painful enough that monthly hasn't happened. I think
>>> we should address these issues and go to monthly releases as originally
>>> stated.
>>>
>>> - if I see new Beam 2.x releases for sure (according to the previous
>>> point),
>>> it would be great to have discussion about Beam 3.x. I think that
>>> one of
>>> interesting new feature that Beam 3.x can provide is around
>>> PCollection with
>>> Schemas. It's something that we started to discuss with Reuven and
>>> Eugene.
>>> In term of schedule,
>>>
>>>
>>> I don't think schemas require Beam 3.0 - I think we can introduce them
>>> without making breaking changes. However there are many other features that
>>> would be very interesting for Beam 3.x, and we should start putting
>>> together a list of them.
>>>
>>>
>>> I would love to see your thoughts & ideas about releases schedule
>>> and Beam 3.x.
>>>
>>> Regards
>>> JB
>>> -- Jean-Baptiste Onofré
>>> jbono...@apache.org 
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>
>


MergeBot bug when regenerating website?

2017-11-28 Thread Etienne Chauchot

Hi guys,

I've just noticed a probable bug on MergeBot on the website static 
content regeneration.


Mergebot seems to badly regenerate website when a page has moved. For 
example see mergebot commit 446586c68c1d244d240fe18ee48e69aba4462949 The 
page documentation/sdk/nexmark/index.html (old url) was deleted but the 
page documentation/sdk/java/nexmark/index.html (new url) was not added 
leading to a http 404.


I manually regenerated the website and merged on apache/asf-site

Regards,

Etienne



Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-28 Thread Kenneth Knowles
Yea, let's work hard on improving the ease and pace of releases. I am not
really happy to have only quarterly releases.

Automation of release process where possible, better test coverage, a
higher resistance to cherry-picks.

Kenn

On Tue, Nov 28, 2017 at 9:14 AM, Jean-Baptiste Onofré 
wrote:

> Hi Reuven,
>
> Yes, I remember that we agreed on a release per month. However, we didn't
> do it before. I think the most important is not the period, it's more a
> stable pace. I think it's more interesting for our community to have
> "always" a release every two months, more than a tentative of a release
> every month that end later than that. Of course, if we can do both, it's
> perfect ;)
>
> For Beam 3.x, I wasn't talking about breaking change, but more about
> "marketing" announcement. I think that, even if we don't break API, some
> features are "strong enough" to be "qualified" in a major version.
>
> I think that any major idea & feature (breaking or not the API) are
> valuables for Beam 3.x (and it's a good sign for our community again ;)).
>
> Thanks !
> Regards
> JB
>
> On 11/28/2017 06:09 PM, Reuven Lax wrote:
>
>>
>>
>> On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré > > wrote:
>>
>> Hi guys,
>>
>> Even if there's no rush, I think it would be great for the community
>> to have
>> a better view on our roadmap and where we are going in term of
>> schedule.
>>
>> I would like to discuss the following:
>> - a best effort to maintain a good release pace or at least provide a
>> rough
>> schedule. For instance, in Apache Karaf, I have a release schedule
>> (http://karaf.apache.org/download.html#container-schedule
>> ). I think
>> a
>> release ~ every quarter would be great.
>>
>>
>> Originally we had stated that we wanted monthly releases of Beam. So far
>> the releases have been painful enough that monthly hasn't happened. I think
>> we should address these issues and go to monthly releases as originally
>> stated.
>>
>> - if I see new Beam 2.x releases for sure (according to the previous
>> point),
>> it would be great to have discussion about Beam 3.x. I think that one
>> of
>> interesting new feature that Beam 3.x can provide is around
>> PCollection with
>> Schemas. It's something that we started to discuss with Reuven and
>> Eugene.
>> In term of schedule,
>>
>>
>> I don't think schemas require Beam 3.0 - I think we can introduce them
>> without making breaking changes. However there are many other features that
>> would be very interesting for Beam 3.x, and we should start putting
>> together a list of them.
>>
>>
>> I would love to see your thoughts & ideas about releases schedule and
>> Beam 3.x.
>>
>> Regards
>> JB
>> -- Jean-Baptiste Onofré
>> jbono...@apache.org 
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-28 Thread Kenneth Knowles
Yea, I think voting is the next step. Luke - I think you are obviously the
right person to set up the email of what exactly we are voting on, since
you've driven this improvement.

On Tue, Nov 28, 2017 at 12:08 AM, Robert Bradshaw 
wrote:

> It's great to see all the discussion going on here.
>
> I think it's important to point out that merging a parallel set of
> gradle build scripts is a separate (and much less disruptive) step
> than, say, switching over the default (or even recommended)
> build/release process to use them, let alone removing the maven build
> files entirely. The latter two should definitely be gated by a formal
> vote (each, probably), with the current state the gradle files can
> mostly be ignored by most people. In particular, this is the kind of
> change that needs to be in master to be evaluated--if it's on a branch
> we can't very well see how it impacts presubmits, and most importantly
> people can't try it out for real development.
>
> I agree that the choice of build tool may attract some contributors
> and discourage others. Having builds that are fast, correct, and
> reproducible will probably matter more to potential contributors than
> the particular command to run. While maven can surely be improved, I
> doubt a 2x improvement (and many more times that for incremental
> builds) is low-hanging fruit, and many of the issues seem quite
> fundamental (e.g. all the special treatment we need for NeedsRunner
> tests, and having to do a (global-by-default) mvn install to skip
> tests of dependencies when testing a leaf module).
>
> Getting data on what other apache projects use could be interesting,
> but unless we gather why such choices were made I don't know that it'd
> be that influential once we've established that both tools are widely
> supported generally.
>
> To re-emphasize, we'll continue to produce and publish maven
> artifacts, so our choice of build system won't matter for those only
> using Beam as a dependency.
>
>
>
> On Mon, Nov 27, 2017 at 10:48 PM, Jean-Baptiste Onofré 
> wrote:
> > Yeah, especially, I think it would have been great to have a vote before
> > merging on master.
> >
> > Not a big deal, however, I'm really community focus ;)
> >
> > Regards
> > JB
> >
> > On 11/28/2017 07:36 AM, Reuven Lax wrote:
> >>
> >> Agreed. I thinking having a formal vote before Luke had numbers and
> >> results would have been too early. However now that we have such
> numbers, we
> >> should think about having a vote.
> >>
> >> Also, while I disagree with Romain that Gradle is not "enterprise ready"
> >> (it's heavily used by Netflix, LinkedIn, Siemens, and is the default
> build
> >> framework for Android apps), it would be interesting to see if any
> other ASF
> >> projects are using it. I don't think that should not make or break the
> >> decision - we should do what's best for the Beam project, and "everyone
> else
> >> is doing something" is rarely a good argument - it will provide good
> data
> >> points for us to evaluate.
> >>
> >> Reuven
> >>
> >> On Mon, Nov 27, 2017 at 10:23 PM, Jean-Baptiste Onofré  >> > wrote:
> >>
> >> Hi Luke,
> >>
> >> just curious (and maybe I missed it): did we do a formal vote to
> merge
> >> the
> >> gradle build ?
> >> Gradle is now on master, we have some Jira to update the release
> guide
> >> with
> >> gradle. It's fine, but I remember only a discussion, not a vote.
> >>
> >> In order to embrace the community and avoid to have some
> contributors
> >> "frustrated" (meaning that "this project doesn't care about
> >> contributor,
> >> they just do whatever they want"), I would have love to see a formal
> >> vote
> >> about Gradle more than just a discussion.
> >>
> >> My $0.01
> >>
> >> Regards
> >> JB
> >>
> >> On 11/27/2017 07:46 PM, Lukasz Cwik wrote:
> >>
> >> I have collected data by running several builds against master
> >> using Gradle
> >> and Maven without using Gradle's support for incremental builds.
> >>
> >> Gradle (mins)
> >> min: 25.04
> >> max: 160.14
> >> median: 45.78
> >> average: 52.19
> >> stdev: 30.80
> >>
> >> Maven (mins)
> >> min: 56.86
> >> max: 216.55 (actually > 240 mins because this data does not
> >> include
> >> timeouts)
> >> median: 87.93
> >> average: 109.10
> >> stdev: 48.01
> >>
> >> I excluded a few timeouts (240 mins) that happened during the
> >> Maven build
> >> from its numbers but we can see conclusively that Gradle is
> twice
> >> as fast
> >> for the build when compared to Maven when run using Jenkins.
> >> On my desktop, I have enabled incremental builds and have seen a
> >> major
> >> improvement on the above numbers but it doesn't yet work
> correctly
> >> because
> >> of incorrectly 

Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-28 Thread Jean-Baptiste Onofré

Hi Reuven,

Yes, I remember that we agreed on a release per month. However, we didn't do it 
before. I think the most important is not the period, it's more a stable pace. I 
think it's more interesting for our community to have "always" a release every 
two months, more than a tentative of a release every month that end later than 
that. Of course, if we can do both, it's perfect ;)


For Beam 3.x, I wasn't talking about breaking change, but more about "marketing" 
announcement. I think that, even if we don't break API, some features are 
"strong enough" to be "qualified" in a major version.


I think that any major idea & feature (breaking or not the API) are valuables 
for Beam 3.x (and it's a good sign for our community again ;)).


Thanks !
Regards
JB

On 11/28/2017 06:09 PM, Reuven Lax wrote:



On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré > wrote:


Hi guys,

Even if there's no rush, I think it would be great for the community to have
a better view on our roadmap and where we are going in term of schedule.

I would like to discuss the following:
- a best effort to maintain a good release pace or at least provide a rough
schedule. For instance, in Apache Karaf, I have a release schedule
(http://karaf.apache.org/download.html#container-schedule
). I think a
release ~ every quarter would be great.


Originally we had stated that we wanted monthly releases of Beam. So far the 
releases have been painful enough that monthly hasn't happened. I think we 
should address these issues and go to monthly releases as originally stated.


- if I see new Beam 2.x releases for sure (according to the previous point),
it would be great to have discussion about Beam 3.x. I think that one of
interesting new feature that Beam 3.x can provide is around PCollection with
Schemas. It's something that we started to discuss with Reuven and Eugene.
In term of schedule,


I don't think schemas require Beam 3.0 - I think we can introduce them without 
making breaking changes. However there are many other features that would be 
very interesting for Beam 3.x, and we should start putting together a list of them.



I would love to see your thoughts & ideas about releases schedule and Beam 
3.x.

Regards
JB
-- 
Jean-Baptiste Onofré

jbono...@apache.org 
http://blog.nanthrax.net
Talend - http://www.talend.com




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-28 Thread Reuven Lax
On Tue, Nov 28, 2017 at 8:55 AM, Jean-Baptiste Onofré 
wrote:

> Hi guys,
>
> Even if there's no rush, I think it would be great for the community to
> have a better view on our roadmap and where we are going in term of
> schedule.
>
> I would like to discuss the following:
> - a best effort to maintain a good release pace or at least provide a
> rough schedule. For instance, in Apache Karaf, I have a release schedule (
> http://karaf.apache.org/download.html#container-schedule). I think a
> release ~ every quarter would be great.
>

Originally we had stated that we wanted monthly releases of Beam. So far
the releases have been painful enough that monthly hasn't happened. I think
we should address these issues and go to monthly releases as originally
stated.

- if I see new Beam 2.x releases for sure (according to the previous
> point), it would be great to have discussion about Beam 3.x. I think that
> one of interesting new feature that Beam 3.x can provide is around
> PCollection with Schemas. It's something that we started to discuss with
> Reuven and Eugene. In term of schedule,
>

I don't think schemas require Beam 3.0 - I think we can introduce them
without making breaking changes. However there are many other features that
would be very interesting for Beam 3.x, and we should start putting
together a list of them.


>
> I would love to see your thoughts & ideas about releases schedule and Beam
> 3.x.
>
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


[DISCUSS] Thinking about Beam 3.x roadmap and release schedule

2017-11-28 Thread Jean-Baptiste Onofré

Hi guys,

Even if there's no rush, I think it would be great for the community to have a 
better view on our roadmap and where we are going in term of schedule.


I would like to discuss the following:
- a best effort to maintain a good release pace or at least provide a rough 
schedule. For instance, in Apache Karaf, I have a release schedule 
(http://karaf.apache.org/download.html#container-schedule). I think a release ~ 
every quarter would be great.
- if I see new Beam 2.x releases for sure (according to the previous point), it 
would be great to have discussion about Beam 3.x. I think that one of 
interesting new feature that Beam 3.x can provide is around PCollection with 
Schemas. It's something that we started to discuss with Reuven and Eugene. In 
term of schedule,


I would love to see your thoughts & ideas about releases schedule and Beam 3.x.

Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [RESULT][VOTE] Migrate to gitbox

2017-11-28 Thread Jean-Baptiste Onofré
FYI, waiting to move forward on the discussion, I disabled the notification on 
dev@ mailing list (to avoid the spam ;)).


Regards
JB

On 11/24/2017 04:58 PM, Kenneth Knowles wrote:

+1 for new mailing list (reviews@)

On Fri, Nov 24, 2017 at 5:20 AM, James  wrote:


+1 for new mailling list (reviews@)

On Thu, Nov 23, 2017 at 7:38 PM Ismaël Mejía  wrote:


If github already does the notifications, I think that having an extra
notifications/reviews mailing list could be overkill (or spammy).
However I can see the value of this for archival reasons, e.g. to
store the history of the project comments out of github for the
future.

+1 for new mailing list (reviews@) or disabled

I don't think that putting this in commits is a good idea, The commits
mailing list already has a good amount of stuff goinig on. I think
that adding more granular information will make it harder to follow.


On Thu, Nov 23, 2017 at 12:17 PM, Jean-Baptiste Onofré 
wrote:

Hi,

following the migration to gitbox, we now have a notification e-mail

(on

the

dev mailing list) for each action on a PR (comments, closing, etc).

It could be very verbose and I think we have to change that. For now, I

will

ask to disable this notification.

However, I think it's worth ask on the mailing list. Basically we have

the

following options:

- send the notification to commits@ mailing list
- send the notification to a new mailing list (like review@ mailing

list)

- leave the notification disabled

Please, let me know what you prefer.

Thanks
Regards
JB


On 11/23/2017 11:19 AM, Jean-Baptiste Onofré wrote:


The migration is done, you have to update your local copy with git

remote

set-url to use gitbox.apache.org instead of git-wip-us.apache.org.

I'm checking the GitHub PRs (if we now have the merge button).

Regards
JB

On 11/23/2017 10:55 AM, Jean-Baptiste Onofré wrote:


Hi guys,

I just got an update from INFRA: the migration to gitbox starts now.

Regards
JB

On 11/07/2017 05:51 PM, Jean-Baptiste Onofré wrote:


Hi guys,

quick update on the gitbox migration.

I created a Jira for INFRA:

https://issues.apache.org/jira/browse/INFRA-15456

It should be done pretty soon.

Regards
JB

On 10/23/2017 07:24 AM, Jean-Baptiste Onofré wrote:


Hi all,

this vote passed with only +1.

I will requuest INFRA to move the repositories to gitbox.

Thanks all for your vote !

Regards
JB

On 10/10/2017 09:42 AM, Jean-Baptiste Onofré wrote:


Hi all,

following the discussion, here's the formal vote to migrate to

gitbox:


[ ] +1, Approve to migrate to gitbox
[ ] -1, Do not migrate (please provide specific comments)

The vote will be open for at least 36 hours. It is adopted by

majority

approval, with at least 3 PMC affirmative votes.

Thanks,
Regards
JB











--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com








--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-28 Thread Romain Manni-Bucau
Guys,

just realized a lot of modules have threadCount=4 or so in the
surefire/failsafe config. It makes it impossible to adapt the
parallelism to the machine and therefore makes the parallelism
inadapted and useless. Can it be a variable at least? -T1C (or -T2C)
should allow to be smoother and aligned with the config machine.

Romain Manni-Bucau
@rmannibucau |  Blog | Old Blog | Github | LinkedIn


2017-11-28 9:53 GMT+01:00 Romain Manni-Bucau :
> Just to answer a previous question:
>
> An ASF github search gives me these stats:
>
> - mvn (org:apache apache filename:pom.xml path:/): 731
> - gradle (org:apache apache filename:build.gradle path:/): 31
>
> Which is consistent with what I saw in enterprises and private repos.
> So way different from the whole github stats which is not done on a
> representative sample if not pre-filtered.
>
> Romain Manni-Bucau
> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>
>
> 2017-11-28 9:08 GMT+01:00 Robert Bradshaw :
>> It's great to see all the discussion going on here.
>>
>> I think it's important to point out that merging a parallel set of
>> gradle build scripts is a separate (and much less disruptive) step
>> than, say, switching over the default (or even recommended)
>> build/release process to use them, let alone removing the maven build
>> files entirely. The latter two should definitely be gated by a formal
>> vote (each, probably), with the current state the gradle files can
>> mostly be ignored by most people. In particular, this is the kind of
>> change that needs to be in master to be evaluated--if it's on a branch
>> we can't very well see how it impacts presubmits, and most importantly
>> people can't try it out for real development.
>>
>> I agree that the choice of build tool may attract some contributors
>> and discourage others. Having builds that are fast, correct, and
>> reproducible will probably matter more to potential contributors than
>> the particular command to run. While maven can surely be improved, I
>> doubt a 2x improvement (and many more times that for incremental
>> builds) is low-hanging fruit, and many of the issues seem quite
>> fundamental (e.g. all the special treatment we need for NeedsRunner
>> tests, and having to do a (global-by-default) mvn install to skip
>> tests of dependencies when testing a leaf module).
>>
>> Getting data on what other apache projects use could be interesting,
>> but unless we gather why such choices were made I don't know that it'd
>> be that influential once we've established that both tools are widely
>> supported generally.
>>
>> To re-emphasize, we'll continue to produce and publish maven
>> artifacts, so our choice of build system won't matter for those only
>> using Beam as a dependency.
>>
>>
>>
>> On Mon, Nov 27, 2017 at 10:48 PM, Jean-Baptiste Onofré  
>> wrote:
>>> Yeah, especially, I think it would have been great to have a vote before
>>> merging on master.
>>>
>>> Not a big deal, however, I'm really community focus ;)
>>>
>>> Regards
>>> JB
>>>
>>> On 11/28/2017 07:36 AM, Reuven Lax wrote:

 Agreed. I thinking having a formal vote before Luke had numbers and
 results would have been too early. However now that we have such numbers, 
 we
 should think about having a vote.

 Also, while I disagree with Romain that Gradle is not "enterprise ready"
 (it's heavily used by Netflix, LinkedIn, Siemens, and is the default build
 framework for Android apps), it would be interesting to see if any other 
 ASF
 projects are using it. I don't think that should not make or break the
 decision - we should do what's best for the Beam project, and "everyone 
 else
 is doing something" is rarely a good argument - it will provide good data
 points for us to evaluate.

 Reuven

 On Mon, Nov 27, 2017 at 10:23 PM, Jean-Baptiste Onofré > wrote:

 Hi Luke,

 just curious (and maybe I missed it): did we do a formal vote to merge
 the
 gradle build ?
 Gradle is now on master, we have some Jira to update the release guide
 with
 gradle. It's fine, but I remember only a discussion, not a vote.

 In order to embrace the community and avoid to have some contributors
 "frustrated" (meaning that "this project doesn't care about
 contributor,
 they just do whatever they want"), I would have love to see a formal
 vote
 about Gradle more than just a discussion.

 My $0.01

 Regards
 JB

 On 11/27/2017 07:46 PM, Lukasz Cwik wrote:

 I have collected data by running several builds against master
 using Gradle
 and Maven without using Gradle's support for incremental builds.

 Gradle (mins)
 min: 25.04
 max: 160.14
  

[GitHub] holdenk commented on issue #4183: [BEAM-3143] Type Inference Python 3 Compatibility

2017-11-28 Thread GitBox
holdenk commented on issue #4183: [BEAM-3143] Type Inference Python 3 
Compatibility
URL: https://github.com/apache/beam/pull/4183#issuecomment-347465959
 
 
   Is this based on https://github.com/apache/beam/pull/4079 ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: gradle dirty files blocking maven build

2017-11-28 Thread Romain Manni-Bucau
Hi guys,

happent again this morning with another folder in python sdk:

$ find . -name etcd

./sdks/python/container/vendor/github.com/xordataexchange/crypt/backend/etcd
./sdks/python/container/vendor/github.com/coreos/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd
./sdks/python/container/vendor/github.com/coreos/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd/cmd/etcd
[etc...]

There is really something fishy in the build which is "breaking" the
filesystem and making any indexing tool (like an IDE) broken.

Romain Manni-Bucau
@rmannibucau |  Blog | Old Blog | Github | LinkedIn


2017-11-25 16:54 GMT+01:00 Romain Manni-Bucau :
> Dont think so, I never use wrappers (gradle or mvn) since they loose part of
> the local settings and setup like takari plugins ;).
>
> Le 25 nov. 2017 09:28, "Jean-Baptiste Onofré"  a écrit :
>>
>> Not the wrapper provided by Beam ?
>>
>> Regards
>> JB
>>
>> On 11/25/2017 09:21 AM, Romain Manni-Bucau wrote:
>>>
>>> Only used gradle build with my lical gradle 4.2.
>>>
>>> Le 25 nov. 2017 07:27, "Manu Zhang"  a écrit :
>>>
 Hi Romain,

 What gradle command  are you running ? I don't find any ".gogradle"
 files.

 Thanks,
 Manu

 On Fri, Nov 24, 2017 at 5:09 PM Jean-Baptiste Onofré 
 wrote:

> Let me try on my local copy.
>
> Thanks for the report.
>
> Regards
> JB
>
> On 11/24/2017 10:04 AM, Romain Manni-Bucau wrote:
>>
>> Not sure JB to be honest, my global gitignore can have hidden them
>> cause it starts with a dot. Was more to share it case it is
>> encountered than to ask for a fix since I'm not sure ATM it comes from
>> beam itself - also wonder if it can happen on the CI if both builds
>> are executed.
>>
>> Romain Manni-Bucau
>> @rmannibucau |  Blog | Old Blog | Github | LinkedIn
>>
>>
>> 2017-11-24 10:02 GMT+01:00 Jean-Baptiste Onofré :
>>>
>>> Hi Romain,
>>>
>>> I guess they are not part of the repo (git clean -x -f -d removes
>>> it),
>>> correct ?
>>>
>>> Let me try.
>>>
>>> Thanks,
>>> Regards
>>> JB
>>>
>>>
>>> On 11/24/2017 10:00 AM, Romain Manni-Bucau wrote:


 Hi guys,

 I don't really know if it comes from my gradle tests or the gradle
 build itself but I realize this morning I had ".gogradle" files in
 beam in a few places and when building with maven the resource
 plugin
 directory scanner goes through these files and seems it loops and
 makes the build very slow in the best case and just locked in the
 worse one.

 Just in case you observe it, "find . -name '.gogradle' | xargs rm

 -Rf"

 solves it.

 Romain Manni-Bucau
 @rmannibucau |  Blog | Old Blog | Github | LinkedIn

>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

>>>
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com


[GitHub] xumingming commented on issue #4168: [BEAM-3238][SQL] Add BeamRecordSqlTypeBuilder

2017-11-28 Thread GitBox
xumingming commented on issue #4168: [BEAM-3238][SQL] Add 
BeamRecordSqlTypeBuilder
URL: https://github.com/apache/beam/pull/4168#issuecomment-347448647
 
 
   retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [DISCUSS] Move away from Apache Maven as build tool

2017-11-28 Thread Robert Bradshaw
It's great to see all the discussion going on here.

I think it's important to point out that merging a parallel set of
gradle build scripts is a separate (and much less disruptive) step
than, say, switching over the default (or even recommended)
build/release process to use them, let alone removing the maven build
files entirely. The latter two should definitely be gated by a formal
vote (each, probably), with the current state the gradle files can
mostly be ignored by most people. In particular, this is the kind of
change that needs to be in master to be evaluated--if it's on a branch
we can't very well see how it impacts presubmits, and most importantly
people can't try it out for real development.

I agree that the choice of build tool may attract some contributors
and discourage others. Having builds that are fast, correct, and
reproducible will probably matter more to potential contributors than
the particular command to run. While maven can surely be improved, I
doubt a 2x improvement (and many more times that for incremental
builds) is low-hanging fruit, and many of the issues seem quite
fundamental (e.g. all the special treatment we need for NeedsRunner
tests, and having to do a (global-by-default) mvn install to skip
tests of dependencies when testing a leaf module).

Getting data on what other apache projects use could be interesting,
but unless we gather why such choices were made I don't know that it'd
be that influential once we've established that both tools are widely
supported generally.

To re-emphasize, we'll continue to produce and publish maven
artifacts, so our choice of build system won't matter for those only
using Beam as a dependency.



On Mon, Nov 27, 2017 at 10:48 PM, Jean-Baptiste Onofré  
wrote:
> Yeah, especially, I think it would have been great to have a vote before
> merging on master.
>
> Not a big deal, however, I'm really community focus ;)
>
> Regards
> JB
>
> On 11/28/2017 07:36 AM, Reuven Lax wrote:
>>
>> Agreed. I thinking having a formal vote before Luke had numbers and
>> results would have been too early. However now that we have such numbers, we
>> should think about having a vote.
>>
>> Also, while I disagree with Romain that Gradle is not "enterprise ready"
>> (it's heavily used by Netflix, LinkedIn, Siemens, and is the default build
>> framework for Android apps), it would be interesting to see if any other ASF
>> projects are using it. I don't think that should not make or break the
>> decision - we should do what's best for the Beam project, and "everyone else
>> is doing something" is rarely a good argument - it will provide good data
>> points for us to evaluate.
>>
>> Reuven
>>
>> On Mon, Nov 27, 2017 at 10:23 PM, Jean-Baptiste Onofré > > wrote:
>>
>> Hi Luke,
>>
>> just curious (and maybe I missed it): did we do a formal vote to merge
>> the
>> gradle build ?
>> Gradle is now on master, we have some Jira to update the release guide
>> with
>> gradle. It's fine, but I remember only a discussion, not a vote.
>>
>> In order to embrace the community and avoid to have some contributors
>> "frustrated" (meaning that "this project doesn't care about
>> contributor,
>> they just do whatever they want"), I would have love to see a formal
>> vote
>> about Gradle more than just a discussion.
>>
>> My $0.01
>>
>> Regards
>> JB
>>
>> On 11/27/2017 07:46 PM, Lukasz Cwik wrote:
>>
>> I have collected data by running several builds against master
>> using Gradle
>> and Maven without using Gradle's support for incremental builds.
>>
>> Gradle (mins)
>> min: 25.04
>> max: 160.14
>> median: 45.78
>> average: 52.19
>> stdev: 30.80
>>
>> Maven (mins)
>> min: 56.86
>> max: 216.55 (actually > 240 mins because this data does not
>> include
>> timeouts)
>> median: 87.93
>> average: 109.10
>> stdev: 48.01
>>
>> I excluded a few timeouts (240 mins) that happened during the
>> Maven build
>> from its numbers but we can see conclusively that Gradle is twice
>> as fast
>> for the build when compared to Maven when run using Jenkins.
>> On my desktop, I have enabled incremental builds and have seen a
>> major
>> improvement on the above numbers but it doesn't yet work correctly
>> because
>> of incorrectly specified inputs/outputs for certain tasks.
>>
>> The data is available here
>>
>> https://docs.google.com/spreadsheets/d/1MHVjF-xoI49_NJqEQakUgnNIQ7Qbjzu8Y1q_h3dbF1M/edit?usp=sharing
>>
>> 
>>
>> With this data, I feel confident that we should swap and have
>> opened the
>> following issue https://issues.apache.org/jira/browse/BEAM-3249
>>