Re: [DISCUSS] Releasing Beam in the presence of emergencies

2018-06-14 Thread Jean-Baptiste Onofré
Hi Rafael,

It's a good point but I don't see nothing more to do on our side: if a
emergency issue is detected, then we have to address it and release a
fix release (x.y.z where z is the specific release fixing the issue).
The commitment is a best effort as in all community: if an emergency
issue is detected, qualified and accepted, then we do our best to
provide a fix and do the fix release.

So, for me, it's already handled.

By the way, just a quick reminder in term of release:

- now that gradle release seems ok, we resume our release cycle every ~
6 weeks
- we can cut release anytime if required, especially to address
emergency issues.

Regards
JB

On 14/06/2018 22:33, Rafael Fernandez wrote:
> Hi Beam devs,
> 
> Emergencies can and will happen. As Apache Beam adoption continues to
> grow, the user community will naturally expect the Beam developer
> community to react to critical issues, such as security vulnerabilities
> in our dependencies. I want to make sure the dev community is in
> agreement that we follow the ASF Vulnerability Handling processes [1]
> for such occurrences.
> 
> 
> In addition, I'd like to discuss cases in which data correctness/loss
> may warrant an expedited release (i.e., we did not wait 72 hours), as we
> did in 2.1.1 [2].  Concretely:
> 
> 
>  1.
> 
> Do we need to add anything to our project website so the user
> community knows how we react to such issues?
> 
>  2.
> 
> Should we have an entry in the contributor guide to address critical
> point releases, so we eliminate any guesswork in the event of an
> emergency? (Example text [3])
> 
> 
> Thanks,
> 
> r
> 
> 
> [1]
> 
> _https://apache.org/security/committers.html#vulnerability-handling_
> 
> [2] https://lists.apache.org/list.html?dev@beam.apache.org:lte=40M:2.1.1
> 
> *
> 
> [3] Example text for the contributor guideline:
> 
> 
> What requires a critical point release?
> 
>   *
> 
> A data loss bug
> 
>   *
> 
> A data corruption bug
> 
>   *
> 
> A processing correctness bug
> 
>   *
> 
> For security vulnerabilities, please follow
> https://apache.org/security/committers.html#vulnerability-handling .
> 
> 
> What do we do a critical point release on?
> 
> Our first priority is to stop the bleeding. We ought to prioritize a
> point release for the latest Beam version, based on the release branch,
> that cherrypicks only the intended fix.
> 
>   *
> 
> We've done it before! Remember 2.1.1
> ?
> 
>   o
> 
> Since this is a critical release, we can relax our usual 72 hour
> voting policy. It worked well for 2.1.1, we should make it
> repeatable: Propose, have PMC folks do due diligence on the
> request, and sign off. Since this is critical, we may want to
> have more than one person working on the release.
> 
>   *
> 
> Once we get it out, the community can discuss which previous
> releases would benefit from a potential point release.
> 
> 
> Who proposes a critical point release?
> 
> Any member of the community. 3 PMC +1 votes are sufficient to get the
> process rolling.
> 
> *
> 
> 
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-14 Thread Jean-Baptiste Onofré
OK, I started the RC2, but I'm stopping the process to cut a new one.

Is it ok from your side ?

Regards
JB

On 15/06/2018 01:54, Charles Chen wrote:
> Looks like there is something wrong with PR 5636
>  which we cherry-picked
> above.  It breaks leaderboard examples which previously passed.  I've
> reopened the issue and will update this thread shortly.
> 
> On Thu, Jun 14, 2018 at 12:55 PM Jean-Baptiste Onofré  > wrote:
> 
> Sure, just in time ;)
> 
> Regards
> JB
> 
> On 14/06/2018 20:58, Charles Chen wrote:
> > Can you also merge the CP https://github.com/apache/beam/pull/5636 for
> > https://issues.apache.org/jira/browse/BEAM-4549?
> >
> > On Thu, Jun 14, 2018 at 6:52 AM Jean-Baptiste Onofré
> mailto:j...@nanthrax.net>
> > >> wrote:
> >
> >     FYI, I'm starting RC2 right now.
> >
> >     Stay tuned !
> >
> >     Regards
> >     JB
> >
> >     On 06/06/2018 10:44, Jean-Baptiste Onofré wrote:
> >     > Hi everyone,
> >     >
> >     > Please review and vote on the release candidate #1 for the
> version
> >     > 2.5.0, as follows:
> >     >
> >     > [ ] +1, Approve the release
> >     > [ ] -1, Do not approve the release (please provide specific
> comments)
> >     >
> >     > NB: this is the first release using Gradle, so don't be too
> harsh ;) A
> >     > PR about the release guide will follow thanks to this release.
> >     >
> >     > The complete staging area is available for your review, which
> >     includes:
> >     > * JIRA release notes [1],
> >     > * the official Apache source release to be deployed to
> >     dist.apache.org  
> >     > [2], which is signed with the key with fingerprint C8282E76 [3],
> >     > * all artifacts to be deployed to the Maven Central
> Repository [4],
> >     > * source code tag "v2.5.0-RC1" [5],
> >     > * website pull request listing the release and publishing
> the API
> >     > reference manual [6].
> >     > * Java artifacts were built with Gradle 4.7 (wrapper) and
> >     OpenJDK/Oracle
> >     > JDK 1.8.0_172 (Oracle Corporation 25.172-b11).
> >     > * Python artifacts are deployed along with the source
> release to the
> >     > dist.apache.org 
>  [2].
> >     >
> >     > The vote will be open for at least 72 hours. It is adopted
> by majority
> >     > approval, with at least 3 PMC affirmative votes.
> >     >
> >     > Thanks,
> >     > JB
> >     >
> >     > [1]
> >     >
> >   
>  
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12342847
> >     > [2] https://dist.apache.org/repos/dist/dev/beam/2.5.0/
> >     > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >     > [4]
> >   
>  https://repository.apache.org/content/repositories/orgapachebeam-1041/
> >     > [5] https://github.com/apache/beam/tree/v2.5.0-RC1
> >     > [6] https://github.com/apache/beam-site/pull/463
> >     >
> >
> >     --
> >     Jean-Baptiste Onofré
> >     jbono...@apache.org 
> >
> >     http://blog.nanthrax.net
> >     Talend - http://www.talend.com
> >
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [DISCUSS] Releasing Beam in the presence of emergencies

2018-06-14 Thread Ahmet Altay
Thank you Rafael.

I think it is a good idea to include our commitment, including concrete
steps on our website. This would make it easier for enterprise users to
choose Beam. Even though this is already partially Apache policy and there
is precedence in our project with 2.1.1 release; increasing the visibility
for users would be a positive addition.

I would suggest changing the link in [2] to (
https://lists.apache.org/list.html?dev@beam.apache.org:gte=1d:2.1.1) so
that link does not expire after some time.

Ahmet

On Thu, Jun 14, 2018 at 1:33 PM, Rafael Fernandez 
wrote:

> Hi Beam devs,
>
> Emergencies can and will happen. As Apache Beam adoption continues to
> grow, the user community will naturally expect the Beam developer community
> to react to critical issues, such as security vulnerabilities in our
> dependencies. I want to make sure the dev community is in agreement that we
> follow the ASF Vulnerability Handling processes [1] for such occurrences.
>
> In addition, I'd like to discuss cases in which data correctness/loss may
> warrant an expedited release (i.e., we did not wait 72 hours), as we did in
> 2.1.1 [2].  Concretely:
>
>
>1.
>
>Do we need to add anything to our project website so the user
>community knows how we react to such issues?
>2.
>
>Should we have an entry in the contributor guide to address critical
>point releases, so we eliminate any guesswork in the event of an emergency?
>(Example text [3])
>
>
> Thanks,
>
> r
>
>
> [1]
> *https://apache.org/security/committers.html#vulnerability-handling
> *
>
> [2] https://lists.apache.org/list.html?dev@beam.apache.org:lte=40M:2.1.1
>
>
>
>
>
>
>
>
> *[3] Example text for the contributor guideline:What requires a critical
> point release? - A data loss bug- A data corruption bug- A processing
> correctness bug- For security vulnerabilities, please follow
> https://apache.org/security/committers.html#vulnerability-handling
>  .What
> do we do a critical point release on?Our first priority is to stop the
> bleeding. We ought to prioritize a point release for the latest Beam
> version, based on the release branch, that cherrypicks only the intended
> fix. - We've done it before! Remember 2.1.1
> ? -
> Since this is a critical release, we can relax our usual 72 hour voting
> policy. It worked well for 2.1.1, we should make it repeatable: Propose,
> have PMC folks do due diligence on the request, and sign off. Since this is
> critical, we may want to have more than one person working on the release.-
> Once we get it out, the community can discuss which previous releases would
> benefit from a potential point release. Who proposes a critical point
> release?Any member of the community. 3 PMC +1 votes are sufficient to get
> the process rolling.*
>
>
>
>


Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-14 Thread Charles Chen
Looks like there is something wrong with PR 5636
 which we cherry-picked above.
It breaks leaderboard examples which previously passed.  I've reopened the
issue and will update this thread shortly.

On Thu, Jun 14, 2018 at 12:55 PM Jean-Baptiste Onofré 
wrote:

> Sure, just in time ;)
>
> Regards
> JB
>
> On 14/06/2018 20:58, Charles Chen wrote:
> > Can you also merge the CP https://github.com/apache/beam/pull/5636 for
> > https://issues.apache.org/jira/browse/BEAM-4549?
> >
> > On Thu, Jun 14, 2018 at 6:52 AM Jean-Baptiste Onofré  > > wrote:
> >
> > FYI, I'm starting RC2 right now.
> >
> > Stay tuned !
> >
> > Regards
> > JB
> >
> > On 06/06/2018 10:44, Jean-Baptiste Onofré wrote:
> > > Hi everyone,
> > >
> > > Please review and vote on the release candidate #1 for the version
> > > 2.5.0, as follows:
> > >
> > > [ ] +1, Approve the release
> > > [ ] -1, Do not approve the release (please provide specific
> comments)
> > >
> > > NB: this is the first release using Gradle, so don't be too harsh
> ;) A
> > > PR about the release guide will follow thanks to this release.
> > >
> > > The complete staging area is available for your review, which
> > includes:
> > > * JIRA release notes [1],
> > > * the official Apache source release to be deployed to
> > dist.apache.org 
> > > [2], which is signed with the key with fingerprint C8282E76 [3],
> > > * all artifacts to be deployed to the Maven Central Repository [4],
> > > * source code tag "v2.5.0-RC1" [5],
> > > * website pull request listing the release and publishing the API
> > > reference manual [6].
> > > * Java artifacts were built with Gradle 4.7 (wrapper) and
> > OpenJDK/Oracle
> > > JDK 1.8.0_172 (Oracle Corporation 25.172-b11).
> > > * Python artifacts are deployed along with the source release to
> the
> > > dist.apache.org  [2].
> > >
> > > The vote will be open for at least 72 hours. It is adopted by
> majority
> > > approval, with at least 3 PMC affirmative votes.
> > >
> > > Thanks,
> > > JB
> > >
> > > [1]
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12342847
> > > [2] https://dist.apache.org/repos/dist/dev/beam/2.5.0/
> > > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > > [4]
> >
> https://repository.apache.org/content/repositories/orgapachebeam-1041/
> > > [5] https://github.com/apache/beam/tree/v2.5.0-RC1
> > > [6] https://github.com/apache/beam-site/pull/463
> > >
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org 
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Precommits broken?

2018-06-14 Thread Scott Wegner
Brilliant. That seems like a very clean solution that we can implement
today. I'll get started; thanks for the idea Andrew!

On Thu, Jun 14, 2018 at 2:15 PM Udi Meiri  wrote:

> +1 for separate jobs if it gets us faster to pre-commit filtering
>
> On Thu, Jun 14, 2018 at 11:22 AM Kenneth Knowles  wrote:
>
>> I like Andrew's solution. Just totally separate jobs for automatic and
>> manual.
>>
>> Kenn
>>
>> On Thu, Jun 14, 2018 at 9:56 AM Lukasz Cwik  wrote:
>>
>>> That seems like a pretty good interim solution.
>>>
>>> On Thu, Jun 14, 2018 at 9:53 AM Andrew Pilloud 
>>> wrote:
>>>
 If you always run one job for automated and another job for manual you
 wouldn't need to remember two trigger phrases. The automated jobs don't
 even need trigger phrases. As long as the status contexts are the same
 github users never have to know they are two separate jobs.

 Andrew

 On Thu, Jun 14, 2018 at 9:49 AM Lukasz Cwik  wrote:

> I thought of that as well but would find it annoying that I would need
> to remember two sets of triggers, the ones for the automated jobs and the
> ones for the manual runs. If we re-use the same precommit trigger phrase,
> we would get two runs (automated and manual) of effectively the same thing
> for the jobs where the automated one wouldn't get filtered out.
>
> On Thu, Jun 14, 2018 at 9:46 AM Andrew Pilloud 
> wrote:
>
>> Might there be a third option of creating a different jenkins job for
>> PR change and manual triggers? It would clutter up the jenkins interface 
>> a
>> bit, but they could both post status to the same commitStatusContext on
>> Github, so no one would notice there.
>>
>> Andrew
>>
>> On Wed, Jun 13, 2018 at 11:14 PM Jason Kuster 
>> wrote:
>>
>>> Having submitted a patch to the ghprb-plugin repo before, I think
>>> that regretfully option (b) is probably the right decision here given 
>>> that
>>> it's unlikely to get accepted, merged, released, and to have Infra 
>>> update
>>> the plugin in under a week.
>>>
>>> On Wed, Jun 13, 2018 at 10:42 PM Scott Wegner 
>>> wrote:
>>>
 Indeed, I was going to send out an email about pre-commit
 filtering, but we've already found some kinks and may need to revert 
 it.

 The change was submitted in PR#5611 [1] and enables Jenkins
 triggering to only run pre-commits based on modified files. However, 
 Udi
 noticed that this also prevents manually running pre-commits on a PR 
 with
 trigger phrases when your PR changes don't match the pre-commit include
 path [2]. This was blocking 2.5.0 release validation, so I have a PR 
 out to
 revert the change [3].

 I did some investigation and this is a deficiency in the Jenkins
 plugin used to trigger jobs on pull requests. I've filed a bug [4] and
 submitted a PR [5], but there's no guarantee that it'll get accepted or
 when it will be available.

 Question for others: we were hoping to enable pre-commit triggering
 as an optimization to decrease testing wait time and limit the impact 
 of
 test flakiness [6]. But this bug in the plugin means we'd lose the 
 ability
 to manually trigger pre-commits which aren't automatically run. One
 workaround would be to run the tests locally instead of on Jenkins, 
 though
 that's clearly less desirable. Is this a blocker?

 Should we:
 (a) Keep pre-commit triggering enabled for now and hope the
 upstream patch gets accepted, or
 (b) Revert the pre-commit change and wait for the patch

 Thoughts?

 [1] https://github.com/apache/beam/pull/5611
 [2] https://github.com/apache/beam/pull/5607#issuecomment-397080770

 [3] https://github.com/apache/beam/pull/5638
 [4] https://github.com/jenkinsci/ghprb-plugin/issues/678
 [5] https://github.com/jenkinsci/ghprb-plugin/pull/680
 [6]
 https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#bookmark=id.6j8bwxnbp7fr


 On Wed, Jun 13, 2018 at 10:03 PM Rui Wang 
 wrote:

> Precommit filter is a really coool optimization!
>
> -Rui
>
> On Wed, Jun 13, 2018 at 5:21 PM Andrew Pilloud <
> apill...@google.com> wrote:
>
>> Ah, so this is intended and I didn't break anything? Cool! Sorry
>> for the false alarm, looks like a great build optimization!
>>
>> Andrew
>>
>> On Wed, Jun 13, 2018 at 5:06 PM Yifan Zou 
>> wrote:
>>
>>> Probably due to the precommit filter applied in #5611
>>> 

Re: Beam Dependency Check Report (2018-06-13)

2018-06-14 Thread Yifan Zou
Thank you Paul for letting us know this issue. We will take care of it when
upgrading dependencies.

On Thu, Jun 14, 2018 at 7:23 AM Paul Gerver  wrote:

> I do have one request to be added to the Java SDK version updates:
> Beam-3831 [1]. The Google Core depends on the old org.json package which
> ASF discourages using because of the "Use only for good, not evil" clause.
>
> [1] https://issues.apache.org/jira/browse/BEAM-3831
>
> On Thu, Jun 14, 2018 at 3:03 AM Etienne Chauchot 
> wrote:
>
>> Thanks Yifan,
>>
>> This is great ! It would help us maintain Beam more easily and probably
>> help us fixing CVE as well.
>>
>> Etienne
>>
>> Le mercredi 13 juin 2018 à 07:45 -0700, Yifan Zou a écrit :
>>
>> Hi,
>>
>>
>> I want to follow up and explain this email.
>>
>>
>> This is a sample email that reports the results of Beam SDK dependency
>> check, which was proposed here
>> .
>> The goal is finding updates for all Beam Python & Java SDKs' dependencies
>> and prioritize them. The job will be auto triggered in Jenkins once a week
>> and generate a report. The report lists the high priority updates base on
>> the following criteria:
>>
>>
>> The dependency update is high priority if:
>>
>> 1. It has major versions update available;
>>
>>   e.g. org.assertj:assertj-core 2.5.0 -> 3.10.0
>>
>>  2. or, it is over 3 minor versions behind the latest version;
>>
>>   e.g. org.tukaani:xz 1.5 -> 1.8
>>
>> 3. or, the current version is behind the later version for over 180 days.
>>
>>
>>   e.g. com.google.auto.service:auto-service 2014-10-24 -> 2017-12-11
>>
>>
>> This job helps Beam contributors to determine the dependency which is far
>> behind the latest released version. The next step would be automating
>> filing JIRA bugs for dep updates, group dependencies and identify owners to
>> take care of the upgrades follow Chamikara's proposal
>> 
>> .
>>
>>
>> For more readings:
>>
>> [Proposal] Beam dependency check automation
>> 
>>  by Yifan Zou
>>
>> [Proposal] Beam dependency update policy
>> 
>>  by *Chamikara Jayalath*
>>
>> Thank you.
>>
>> Yifan Zou
>>
>> On Wed, Jun 13, 2018 at 7:41 AM Apache Jenkins Server <
>> jenk...@builds.apache.org> wrote:
>>
>> High Priority Dependency Updates Of Beam Python SDK:
>> *Dependency Name* *Current Version* *Later Version* *Current Version
>> Release Date* *Later Version Release Date*
>> google-cloud-bigquery 0.25.0 1.3.0 2017-06-26 2018-06-08
>> httplib2 0.9.2 0.11.3 2015-09-28 2018-03-30 High Priority Dependency
>> Updates Of Beam Java SDK:
>> *Dependency Name* *Current Version* *Later Version* *Current Version
>> Release Date* *Later Version Release Date*
>> org.assertj:assertj-core 2.5.0 3.10.0 2016-07-03 2018-05-11
>> com.google.auto.service:auto-service 1.0-rc2 1.0-rc4 2014-10-24
>> 2017-12-11
>> biz.aQute:bndlib 1.43.0 2.0.0.20130123-133441 2011-04-01 2013-02-27
>> org.apache.cassandra:cassandra-all 3.9 3.11.2 2016-09-26 2018-02-14
>> commons-cli:commons-cli 1.2 1.4 2009-03-19 2017-03-09
>> commons-codec:commons-codec 1.9 1.11 2013-12-20 2017-10-17
>> org.apache.commons:commons-dbcp2 2.1.1 2.3.0 2015-08-02 2018-05-08
>> com.typesafe:config 1.3.0 1.3.3 2015-05-08 2018-02-21
>> de.flapdoodle.embed:de.flapdoodle.embed.mongo 1.50.1 2.0.3 2015-12-11
>> 2018-02-14
>> de.flapdoodle.embed:de.flapdoodle.embed.process 1.50.1 2.0.3 2015-12-11
>> 2018-02-14
>> org.apache.derby:derby 10.12.1.1 10.14.2.0 2015-10-10 2018-05-03
>> org.apache.derby:derbyclient 10.12.1.1 10.14.2.0 2015-10-10 2018-05-03
>> org.apache.derby:derbynet 10.12.1.1 10.14.2.0 2015-10-10 2018-05-03
>> org.elasticsearch:elasticsearch 5.6.3 6.2.4 2017-10-06 2018-04-12
>> org.elasticsearch:elasticsearch-hadoop 5.0.0 6.2.4 2016-10-26 2018-04-12
>> org.elasticsearch.client:elasticsearch-rest-client 5.6.3 6.2.4 2017-10-06
>> 2018-04-12
>> com.alibaba:fastjson 1.2.12 1.2.47 2016-05-21 2018-03-15
>> org.elasticsearch.test:framework 5.6.3 6.2.4 2017-10-06 2018-04-12
>> org.freemarker:freemarker 2.3.25-incubating 2.3.28 2016-06-14 2018-03-30
>> org.codehaus.groovy:groovy-all 2.4.13 3.0.0-alpha-2 2017-11-22 2018-04-16
>> org.apache.hbase:hbase-common 1.2.6 2.0.0.3.0.0.3-2 2017-05-29 2018-05-31
>> org.apache.hbase:hbase-hadoop-compat 1.2.6 2.0.0.3.0.0.3-2 2017-05-29
>> 2018-05-31
>> org.apache.hbase:hbase-hadoop2-compat 1.2.6 2.0.0.3.0.0.3-2 2017-05-29
>> 2018-05-31
>> org.apache.hbase:hbase-server 1.2.6 2.0.0.3.0.0.3-2 2017-05-29 2018-05-31
>> org.apache.hbase:hbase-shaded-client 1.2.6 2.0.0.3.0.0.3-2 2017-05-29
>> 2018-05-31
>> org.apache.hbase:hbase-shaded-server 1.2.6 2.0.0-alpha2 2017-05-29
>> 2018-05-31
>> 

Re: [FYI] New Apache Beam Swag Store!

2018-06-14 Thread Robert Burke
Would it be possible to get the brand of the shirts & sweaters, and the
sizing information? The vendor should have that available. The pages have a
lot of fabric and cut description, but not the measurements.
Sizes fit varies dramatically on the brand, and if I get a hoodie, anything
from a Medium to and XL can fit my arms and torso properly.

On Wed, 13 Jun 2018 at 11:10 Griselda Cuevas  wrote:

> Thanks All!
>
> To close the loop on the suggestions, I'll order more t-shirts in black so
> we have some options.
>
> G
>
> On Wed, 13 Jun 2018 at 08:39, Ismaël Mejía  wrote:
>
>> Great ! Thanks Gris and Matthias for putting this in place.
>> Hope to get that hoodie soon. As a suggestion, more colors too, and
>> eventually a t-shirt just with the big B logo.
>> On Mon, Jun 11, 2018 at 6:50 PM Mikhail Gryzykhin 
>> wrote:
>> >
>> > That's nice!
>> >
>> > More colors are appreciated :)
>> >
>> > --Mikhail
>> >
>> >
>> > On Sun, Jun 10, 2018 at 8:20 PM Kenneth Knowles 
>> wrote:
>> >>
>> >> Sweet! Agree with Raghu :-)
>> >>
>> >> Kenn
>> >>
>> >> On Sun, Jun 10, 2018 at 6:06 AM Matthias Baetens <
>> baetensmatth...@gmail.com> wrote:
>> >>>
>> >>> Great news, big thanks for all the work, Gris! Looking forward to
>> people wearing this around the globe ;)
>> >>>
>> >>> On Sat, 9 Jun 2018 at 01:28 Ankur Goenka  wrote:
>> 
>>  Awesome!
>> 
>> 
>>  On Fri, Jun 8, 2018 at 4:24 PM Pablo Estrada 
>> wrote:
>> >
>> > Nice : D
>> >
>> > On Fri, Jun 8, 2018, 3:43 PM Raghu Angadi 
>> wrote:
>> >>
>> >> Woo-hoo! This is terrific.
>> >>
>> >> If we are increasing color choices I would like black or
>> charcoal... Beam logo would really pop on a dark background.
>> >>
>> >> On Fri, Jun 8, 2018 at 3:32 PM Griselda Cuevas 
>> wrote:
>> >>>
>> >>> Hi Beam Community,
>> >>>
>> >>> I just want to share with you the exciting news about our brand
>> new Apache Beam Swag Store!
>> >>>
>> >>> You can find it here: https://store-beam.myshopify.com/
>> >>>
>> >>> How does it work?
>> >>>
>> >>> You can just select the items you want and check-out. Our Vendor
>> ships to anywhere in the world and normally can have swag to be delivered
>> within 1 week. Each company or user will need to pay for their own swag.
>> >>> If you are hosting an event or representing Beam at one, reach
>> out to me or the beam-events-meetups slack channel, I'll be happy to review
>> your event and see if we can sponsor the swag. We'll have codes for this
>> occasions thanks to Google, who has sponsored an initial inventory.
>> >>>
>> >>> If you have feedback, ideas on new swag, questions or
>> suggestions, reach out to me and/or Matthias Baetens.
>> >>>
>> >>> Happy Friday!
>> >>> G
>> >>>
>> >>>
>> > --
>> > Got feedback? go/pabloem-feedback
>> 
>> >>>
>> >>> --
>> >>>
>>
>


Re: Precommits broken?

2018-06-14 Thread Udi Meiri
+1 for separate jobs if it gets us faster to pre-commit filtering

On Thu, Jun 14, 2018 at 11:22 AM Kenneth Knowles  wrote:

> I like Andrew's solution. Just totally separate jobs for automatic and
> manual.
>
> Kenn
>
> On Thu, Jun 14, 2018 at 9:56 AM Lukasz Cwik  wrote:
>
>> That seems like a pretty good interim solution.
>>
>> On Thu, Jun 14, 2018 at 9:53 AM Andrew Pilloud 
>> wrote:
>>
>>> If you always run one job for automated and another job for manual you
>>> wouldn't need to remember two trigger phrases. The automated jobs don't
>>> even need trigger phrases. As long as the status contexts are the same
>>> github users never have to know they are two separate jobs.
>>>
>>> Andrew
>>>
>>> On Thu, Jun 14, 2018 at 9:49 AM Lukasz Cwik  wrote:
>>>
 I thought of that as well but would find it annoying that I would need
 to remember two sets of triggers, the ones for the automated jobs and the
 ones for the manual runs. If we re-use the same precommit trigger phrase,
 we would get two runs (automated and manual) of effectively the same thing
 for the jobs where the automated one wouldn't get filtered out.

 On Thu, Jun 14, 2018 at 9:46 AM Andrew Pilloud 
 wrote:

> Might there be a third option of creating a different jenkins job for
> PR change and manual triggers? It would clutter up the jenkins interface a
> bit, but they could both post status to the same commitStatusContext on
> Github, so no one would notice there.
>
> Andrew
>
> On Wed, Jun 13, 2018 at 11:14 PM Jason Kuster 
> wrote:
>
>> Having submitted a patch to the ghprb-plugin repo before, I think
>> that regretfully option (b) is probably the right decision here given 
>> that
>> it's unlikely to get accepted, merged, released, and to have Infra update
>> the plugin in under a week.
>>
>> On Wed, Jun 13, 2018 at 10:42 PM Scott Wegner 
>> wrote:
>>
>>> Indeed, I was going to send out an email about pre-commit filtering,
>>> but we've already found some kinks and may need to revert it.
>>>
>>> The change was submitted in PR#5611 [1] and enables Jenkins
>>> triggering to only run pre-commits based on modified files. However, Udi
>>> noticed that this also prevents manually running pre-commits on a PR 
>>> with
>>> trigger phrases when your PR changes don't match the pre-commit include
>>> path [2]. This was blocking 2.5.0 release validation, so I have a PR 
>>> out to
>>> revert the change [3].
>>>
>>> I did some investigation and this is a deficiency in the Jenkins
>>> plugin used to trigger jobs on pull requests. I've filed a bug [4] and
>>> submitted a PR [5], but there's no guarantee that it'll get accepted or
>>> when it will be available.
>>>
>>> Question for others: we were hoping to enable pre-commit triggering
>>> as an optimization to decrease testing wait time and limit the impact of
>>> test flakiness [6]. But this bug in the plugin means we'd lose the 
>>> ability
>>> to manually trigger pre-commits which aren't automatically run. One
>>> workaround would be to run the tests locally instead of on Jenkins, 
>>> though
>>> that's clearly less desirable. Is this a blocker?
>>>
>>> Should we:
>>> (a) Keep pre-commit triggering enabled for now and hope the upstream
>>> patch gets accepted, or
>>> (b) Revert the pre-commit change and wait for the patch
>>>
>>> Thoughts?
>>>
>>> [1] https://github.com/apache/beam/pull/5611
>>> [2] https://github.com/apache/beam/pull/5607#issuecomment-397080770
>>> [3] https://github.com/apache/beam/pull/5638
>>> [4] https://github.com/jenkinsci/ghprb-plugin/issues/678
>>> [5] https://github.com/jenkinsci/ghprb-plugin/pull/680
>>> [6]
>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#bookmark=id.6j8bwxnbp7fr
>>>
>>>
>>> On Wed, Jun 13, 2018 at 10:03 PM Rui Wang  wrote:
>>>
 Precommit filter is a really coool optimization!

 -Rui

 On Wed, Jun 13, 2018 at 5:21 PM Andrew Pilloud 
 wrote:

> Ah, so this is intended and I didn't break anything? Cool! Sorry
> for the false alarm, looks like a great build optimization!
>
> Andrew
>
> On Wed, Jun 13, 2018 at 5:06 PM Yifan Zou 
> wrote:
>
>> Probably due to the precommit filter applied in #5611
>> ?
>>
>> On Wed, Jun 13, 2018 at 5:02 PM Andrew Pilloud <
>> apill...@google.com> wrote:
>>
>>> Looks like statuses got posted between me writing this email and
>>> sending it. Still wondering why the python and go jobs appear to be 
>>> missing?
>>>
>>> Andrew
>>>
>>> On Wed, Jun 

Re: Proposing interactive beam runner

2018-06-14 Thread Sindy Li
Thanks Ahmet,

We know quite a few teams in Google are interested to run interactive Beam
pipelines, especially in Python for Machine Learning -- some are already
using it interactively in their own way. So instead of for the those teams
to develop their own version of interactive solution, we want one
repository that people can contribute to. We could also provide better
features like fast re-execution as is shown in the demo.

Thanks,
Sindy

On Wed, Jun 13, 2018 at 5:48 PM, Ahmet Altay  wrote:

> Thank you Sindy.
>
> I like the demo; it looks great. This would be interesting to a lot of
> users. What are your plans for moving this forward? What kind of an input
> you are looking for?
>
> Ahmet
>
> On Wed, Jun 13, 2018 at 2:32 PM, Eugene Kirpichov 
> wrote:
>
>> This is awesome, thanks Sindy! I hope that the questions related to
>> portability will get resolved in a way that will allow to reuse some of the
>> work for other interactive Beam experiences, including SQL as Andrew says,
>> and providing a REPL e.g. for users of Scala or other JVM-based languages.
>>
>> +Neville Li  Do I remember correctly that you guys
>> had some sort of interactivity going in Scio but were looking forward to
>> Beam developing a native solution?
>>
>> On Wed, Jun 13, 2018 at 2:22 PM Sindy Li  wrote:
>>
>>> *Thanks, Andrew!*
>>>
>>> *Here is a link to the demo on Youtube for people interested:*
>>> *https://www.youtube.com/watch?v=c5CjA1e3Cqw=youtu.be
>>> *
>>>
>>> On Wed, Jun 13, 2018 at 1:23 PM, Andrew Pilloud 
>>> wrote:
>>>
 This sounds really interesting, thanks for sharing! We've just begun to
 explore making Beam SQL interactive. The Interactive Runner you've proposed
 sounds like it would solve a bunch of the problems SQL faces as well. SQL
 is written in Java right now, so we can't immediately reuse any code.

 Andrew

 On Wed, Jun 13, 2018 at 11:48 AM Sindy Li  wrote:

> Resending after subscribing to dev list.
>
> -- Forwarded message --
> From: Sindy Li 
> Date: Fri, Jun 8, 2018 at 5:57 PM
> Subject: Proposing interactive beam runner
> To: dev@beam.apache.org
> Cc: Harsh Vardhan , Chamikara Jayalath <
> chamik...@google.com>, Anand Iyer , Robert
> Bradshaw 
>
>
> Hello,
>
> We were exploring ways to provide an interactive notebook experience
> for writing Beam Python pipelines. The design doc
> 
>  provides
> an overview/vision of what we would like to achieve. Pull request
>  provides a prototype for
> the same. The document also provides demo screen shots and
> instructions for running a demo in Jupyter. Please take a look. We believe
> this would be a useful addition to Beam.
>
> Thanks!
>
>
>
>
>>>
>


Invite to comment on the @RequiresStableInput design doc

2018-06-14 Thread Robin Qiu
Hello everyone,

I am Robin Qiu. I joined Google and started working on Beam Java SDK 2
months ago.

As my starting project, I am working on supporting the @RequiresStableInput
annotation in runners. Here is a short design doc. Please take a look and
feel free to comment.
https://docs.google.com/document/d/117yRKbbcEdm3eIKB_26BHOJGmHSZl1YNoF0RqWGtqAM/edit?usp=sharing

You can also find the context of the problem in this email thread:
https://lists.apache.org/thread.html/ae3c838df060e47148439d1dad818d5e927b2a25ff00cc4153221dff@%3Cdev.beam.apache.org%3E


Best,
Robin


[DISCUSS] Releasing Beam in the presence of emergencies

2018-06-14 Thread Rafael Fernandez
Hi Beam devs,

Emergencies can and will happen. As Apache Beam adoption continues to grow,
the user community will naturally expect the Beam developer community to
react to critical issues, such as security vulnerabilities in our
dependencies. I want to make sure the dev community is in agreement that we
follow the ASF Vulnerability Handling processes [1] for such occurrences.

In addition, I'd like to discuss cases in which data correctness/loss may
warrant an expedited release (i.e., we did not wait 72 hours), as we did in
2.1.1 [2].  Concretely:


   1.

   Do we need to add anything to our project website so the user community
   knows how we react to such issues?
   2.

   Should we have an entry in the contributor guide to address critical
   point releases, so we eliminate any guesswork in the event of an emergency?
   (Example text [3])


Thanks,

r


[1]
*https://apache.org/security/committers.html#vulnerability-handling
*

[2] https://lists.apache.org/list.html?dev@beam.apache.org:lte=40M:2.1.1








*[3] Example text for the contributor guideline:What requires a critical
point release? - A data loss bug- A data corruption bug- A processing
correctness bug- For security vulnerabilities, please follow
https://apache.org/security/committers.html#vulnerability-handling
 .What
do we do a critical point release on?Our first priority is to stop the
bleeding. We ought to prioritize a point release for the latest Beam
version, based on the release branch, that cherrypicks only the intended
fix. - We've done it before! Remember 2.1.1
? -
Since this is a critical release, we can relax our usual 72 hour voting
policy. It worked well for 2.1.1, we should make it repeatable: Propose,
have PMC folks do due diligence on the request, and sign off. Since this is
critical, we may want to have more than one person working on the release.-
Once we get it out, the community can discuss which previous releases would
benefit from a potential point release. Who proposes a critical point
release?Any member of the community. 3 PMC +1 votes are sufficient to get
the process rolling.*


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-14 Thread Jean-Baptiste Onofré
Sure, just in time ;)

Regards
JB

On 14/06/2018 20:58, Charles Chen wrote:
> Can you also merge the CP https://github.com/apache/beam/pull/5636 for
> https://issues.apache.org/jira/browse/BEAM-4549?
> 
> On Thu, Jun 14, 2018 at 6:52 AM Jean-Baptiste Onofré  > wrote:
> 
> FYI, I'm starting RC2 right now.
> 
> Stay tuned !
> 
> Regards
> JB
> 
> On 06/06/2018 10:44, Jean-Baptiste Onofré wrote:
> > Hi everyone,
> >
> > Please review and vote on the release candidate #1 for the version
> > 2.5.0, as follows:
> >
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > NB: this is the first release using Gradle, so don't be too harsh ;) A
> > PR about the release guide will follow thanks to this release.
> >
> > The complete staging area is available for your review, which
> includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to
> dist.apache.org 
> > [2], which is signed with the key with fingerprint C8282E76 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "v2.5.0-RC1" [5],
> > * website pull request listing the release and publishing the API
> > reference manual [6].
> > * Java artifacts were built with Gradle 4.7 (wrapper) and
> OpenJDK/Oracle
> > JDK 1.8.0_172 (Oracle Corporation 25.172-b11).
> > * Python artifacts are deployed along with the source release to the
> > dist.apache.org  [2].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > JB
> >
> > [1]
> >
> 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12342847
> > [2] https://dist.apache.org/repos/dist/dev/beam/2.5.0/
> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1041/
> > [5] https://github.com/apache/beam/tree/v2.5.0-RC1
> > [6] https://github.com/apache/beam-site/pull/463
> >
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-14 Thread Charles Chen
Can you also merge the CP https://github.com/apache/beam/pull/5636 for
https://issues.apache.org/jira/browse/BEAM-4549?

On Thu, Jun 14, 2018 at 6:52 AM Jean-Baptiste Onofré 
wrote:

> FYI, I'm starting RC2 right now.
>
> Stay tuned !
>
> Regards
> JB
>
> On 06/06/2018 10:44, Jean-Baptiste Onofré wrote:
> > Hi everyone,
> >
> > Please review and vote on the release candidate #1 for the version
> > 2.5.0, as follows:
> >
> > [ ] +1, Approve the release
> > [ ] -1, Do not approve the release (please provide specific comments)
> >
> > NB: this is the first release using Gradle, so don't be too harsh ;) A
> > PR about the release guide will follow thanks to this release.
> >
> > The complete staging area is available for your review, which includes:
> > * JIRA release notes [1],
> > * the official Apache source release to be deployed to dist.apache.org
> > [2], which is signed with the key with fingerprint C8282E76 [3],
> > * all artifacts to be deployed to the Maven Central Repository [4],
> > * source code tag "v2.5.0-RC1" [5],
> > * website pull request listing the release and publishing the API
> > reference manual [6].
> > * Java artifacts were built with Gradle 4.7 (wrapper) and OpenJDK/Oracle
> > JDK 1.8.0_172 (Oracle Corporation 25.172-b11).
> > * Python artifacts are deployed along with the source release to the
> > dist.apache.org [2].
> >
> > The vote will be open for at least 72 hours. It is adopted by majority
> > approval, with at least 3 PMC affirmative votes.
> >
> > Thanks,
> > JB
> >
> > [1]
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12342847
> > [2] https://dist.apache.org/repos/dist/dev/beam/2.5.0/
> > [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> > [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1041/
> > [5] https://github.com/apache/beam/tree/v2.5.0-RC1
> > [6] https://github.com/apache/beam-site/pull/463
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Precommits broken?

2018-06-14 Thread Kenneth Knowles
I like Andrew's solution. Just totally separate jobs for automatic and
manual.

Kenn

On Thu, Jun 14, 2018 at 9:56 AM Lukasz Cwik  wrote:

> That seems like a pretty good interim solution.
>
> On Thu, Jun 14, 2018 at 9:53 AM Andrew Pilloud 
> wrote:
>
>> If you always run one job for automated and another job for manual you
>> wouldn't need to remember two trigger phrases. The automated jobs don't
>> even need trigger phrases. As long as the status contexts are the same
>> github users never have to know they are two separate jobs.
>>
>> Andrew
>>
>> On Thu, Jun 14, 2018 at 9:49 AM Lukasz Cwik  wrote:
>>
>>> I thought of that as well but would find it annoying that I would need
>>> to remember two sets of triggers, the ones for the automated jobs and the
>>> ones for the manual runs. If we re-use the same precommit trigger phrase,
>>> we would get two runs (automated and manual) of effectively the same thing
>>> for the jobs where the automated one wouldn't get filtered out.
>>>
>>> On Thu, Jun 14, 2018 at 9:46 AM Andrew Pilloud 
>>> wrote:
>>>
 Might there be a third option of creating a different jenkins job for
 PR change and manual triggers? It would clutter up the jenkins interface a
 bit, but they could both post status to the same commitStatusContext on
 Github, so no one would notice there.

 Andrew

 On Wed, Jun 13, 2018 at 11:14 PM Jason Kuster 
 wrote:

> Having submitted a patch to the ghprb-plugin repo before, I think that
> regretfully option (b) is probably the right decision here given that it's
> unlikely to get accepted, merged, released, and to have Infra update the
> plugin in under a week.
>
> On Wed, Jun 13, 2018 at 10:42 PM Scott Wegner 
> wrote:
>
>> Indeed, I was going to send out an email about pre-commit filtering,
>> but we've already found some kinks and may need to revert it.
>>
>> The change was submitted in PR#5611 [1] and enables Jenkins
>> triggering to only run pre-commits based on modified files. However, Udi
>> noticed that this also prevents manually running pre-commits on a PR with
>> trigger phrases when your PR changes don't match the pre-commit include
>> path [2]. This was blocking 2.5.0 release validation, so I have a PR out 
>> to
>> revert the change [3].
>>
>> I did some investigation and this is a deficiency in the Jenkins
>> plugin used to trigger jobs on pull requests. I've filed a bug [4] and
>> submitted a PR [5], but there's no guarantee that it'll get accepted or
>> when it will be available.
>>
>> Question for others: we were hoping to enable pre-commit triggering
>> as an optimization to decrease testing wait time and limit the impact of
>> test flakiness [6]. But this bug in the plugin means we'd lose the 
>> ability
>> to manually trigger pre-commits which aren't automatically run. One
>> workaround would be to run the tests locally instead of on Jenkins, 
>> though
>> that's clearly less desirable. Is this a blocker?
>>
>> Should we:
>> (a) Keep pre-commit triggering enabled for now and hope the upstream
>> patch gets accepted, or
>> (b) Revert the pre-commit change and wait for the patch
>>
>> Thoughts?
>>
>> [1] https://github.com/apache/beam/pull/5611
>> [2] https://github.com/apache/beam/pull/5607#issuecomment-397080770
>> [3] https://github.com/apache/beam/pull/5638
>> [4] https://github.com/jenkinsci/ghprb-plugin/issues/678
>> [5] https://github.com/jenkinsci/ghprb-plugin/pull/680
>> [6]
>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#bookmark=id.6j8bwxnbp7fr
>>
>>
>> On Wed, Jun 13, 2018 at 10:03 PM Rui Wang  wrote:
>>
>>> Precommit filter is a really coool optimization!
>>>
>>> -Rui
>>>
>>> On Wed, Jun 13, 2018 at 5:21 PM Andrew Pilloud 
>>> wrote:
>>>
 Ah, so this is intended and I didn't break anything? Cool! Sorry
 for the false alarm, looks like a great build optimization!

 Andrew

 On Wed, Jun 13, 2018 at 5:06 PM Yifan Zou 
 wrote:

> Probably due to the precommit filter applied in #5611
> ?
>
> On Wed, Jun 13, 2018 at 5:02 PM Andrew Pilloud <
> apill...@google.com> wrote:
>
>> Looks like statuses got posted between me writing this email and
>> sending it. Still wondering why the python and go jobs appear to be 
>> missing?
>>
>> Andrew
>>
>> On Wed, Jun 13, 2018 at 5:00 PM Andrew Pilloud <
>> apill...@google.com> wrote:
>>
>>> Recent PRs don't appear to be running all the precommits, and
>>> success status isn't being pushed to PRs. Anyone know what is going 
>>> on?

Re: Building and visualizing the Beam SQL graph

2018-06-14 Thread Mingmin Xu
Is there a guideline about how the name provided in `PCollection.apply(
String name, PTransform, PCollection> t)` is
adopted in different runners? I suppose that should be the option, to have
a readable graph for all runners, instead of 'adjust' it to make DataFlow
runner works only.

On Thu, Jun 14, 2018 at 8:53 AM, Reuven Lax  wrote:

> There was a previous discussion about having generic attributes on
> PCollection. Maybe this is a good driving use case?
>
> On Wed, Jun 13, 2018 at 4:36 PM Kenneth Knowles  wrote:
>
>> Another thing to consider is that we might return something like a
>> "SqlPCollection" that is the PCollection plus additional metadata that
>> is useful to the shell / enumerable converter (such as if the PCollection
>> has a known finite size due to LIMIT, even if it is "unbounded", and the
>> shell can return control to the user once it receives enough rows). After
>> your proposed change this will be much more natural to do, so that's
>> another point in favor of the refactor.
>>
>> Kenn
>>
>> On Wed, Jun 13, 2018 at 10:22 AM Andrew Pilloud 
>> wrote:
>>
>>> One of my goals is to make the graph easier to read and map back to the
>>> SQL EXPLAIN output. The way the graph is currently built (`toPTransform` vs
>>> `toPCollection`) does make a big difference in that graph. I think it is
>>> also important to have a common function to do the apply with consistent
>>> naming. I think that will greatly help with ease of understanding. It
>>> sounds like what really want is this in the BeamRelNode interface:
>>>
>>> PInput buildPInput(Pipeline pipeline);
>>> PTransform> buildPTransform();
>>>
>>> default PCollection toPCollection(Pipeline pipeline) {
>>> return buildPInput(pipeline).apply(getStageName(),
>>> buildPTransform());
>>> }
>>>
>>> Andrew
>>>
>>> On Mon, Jun 11, 2018 at 2:27 PM Mingmin Xu  wrote:
>>>
 EXPLAIN shows the execution plan in SQL perspective only. After
 converting to a Beam composite PTransform, there're more steps underneath,
 each Runner re-org Beam PTransforms again which makes the final pipeline
 hard to read. In SQL module itself, I don't see any difference between
 `toPTransform` and `toPCollection`. We could have an easy-to-understand
 step name when converting RelNodes, but Runners show the graph to
 developers.

 Mingmin

 On Mon, Jun 11, 2018 at 2:06 PM, Andrew Pilloud 
 wrote:

> That sounds correct. And because each rel node might have a different
> input there isn't a standard interface (like PTransform Row>, PCollection> toPTransform());
>
> Andrew
>
> On Mon, Jun 11, 2018 at 1:31 PM Kenneth Knowles 
> wrote:
>
>> Agree with that. It will be kind of tricky to generalize. I think
>> there are some criteria in this case that might apply in other cases:
>>
>> 1. Each rel node (or construct of a DSL) should have a PTransform for
>> how it computes its result from its inputs.
>> 2. The inputs to that PTransform should actually be the inputs to the
>> rel node!
>>
>> So I tried to improve #1 but I probably made #2 worse.
>>
>> Kenn
>>
>> On Mon, Jun 11, 2018 at 12:53 PM Anton Kedin 
>> wrote:
>>
>>> Not answering the original question, but doesn't "explain" satisfy
>>> the SQL use case?
>>>
>>> Going forward we probably want to solve this in a more general way.
>>> We have at least 3 ways to represent the pipeline:
>>>  - how runner executes it;
>>>  - what it looks like when constructed;
>>>  - what the user was describing in DSL;
>>> And there will probably be more, if extra layers are built on top of
>>> DSLs.
>>>
>>> If possible, we probably should be able to map any level of
>>> abstraction to any other to better understand and debug the pipelines.
>>>
>>>
>>> On Mon, Jun 11, 2018 at 12:17 PM Kenneth Knowles 
>>> wrote:
>>>
 In other words, revert https://github.com/
 apache/beam/pull/4705/files, at least in spirit? I agree :-)

 Kenn

 On Mon, Jun 11, 2018 at 11:39 AM Andrew Pilloud <
 apill...@google.com> wrote:

> We are currently converting the Calcite Rel tree to Beam by
> recursively building a tree of nested PTransforms. This results in a 
> weird
> nested graph in the dataflow UI where each node contains its inputs 
> nested
> inside of it. I'm going to change the internal data structure for
> converting the tree from a PTransform to a PCollection, which will 
> result
> in a more accurate representation of the tree structure being built 
> and
> should simplify the code as well. This will not change the public 
> interface
> to SQL, which will remain a PTransform. Any thoughts or objections?
>
> I was also wondering if there are tools for 

Re: [BigQuery] TableRowJsonCoder question

2018-06-14 Thread Etienne Chauchot
Thanks Reuven, Using SchemaCoder is better indeed to avoid loosing the type 
information.
Etienne
Le jeudi 14 juin 2018 à 10:04 -0700, Reuven Lax a écrit :
> I think Thomas Groh hit this issue and might know a workaround.
> In general, TableRowJsonCoder has been a huge pain, partially because Json 
> itself cannot always represent all types
> (numeric types are a constant source of trouble in Json). In addition, I've 
> found that encoding all data into Json
> (which is space inefficient) is quite expensive when shuffling that data (and 
> bigQueryIO does do a GroupByKey on
> TableRows). I'm working on a PR that will extract schema information and 
> allow BigQueryIO to use SchemaCoder instead
> of TableRowJsonCoder, however this is not quite ready to be merged yet.
> 
> Reuven
> On Wed, Jun 13, 2018 at 1:54 AM Etienne Chauchot  wrote:
> > Hi all,
> > 
> > While playing with BigQueryIO I noticed something. 
> > 
> > When we create a TableRow (e.g. in a row function in bigQueryIO) using new 
> > TableRow().set(), for ex a long gets
> > boxed into a Long. But when it is encoded using TableRowJsonCoder and then 
> > re-read it might be decoded as an Integer
> > if the value fits into Integer. It causes failure in asserts in tests like 
> > write then read. 
> > What I did for now is to downcast long to int to force it to be boxed into 
> > an Integer (because test value fits into
> > Integer) at TableRow creation.
> > 
> > Is there a way to fix it in TableRowJsonCoder or a better workaround?
> > 
> > Etienne

Re: [BigQuery] TableRowJsonCoder question

2018-06-14 Thread Reuven Lax
I think Thomas Groh hit this issue and might know a workaround.

In general, TableRowJsonCoder has been a huge pain, partially because Json
itself cannot always represent all types (numeric types are a constant
source of trouble in Json). In addition, I've found that encoding all data
into Json (which is space inefficient) is quite expensive when shuffling
that data (and bigQueryIO does do a GroupByKey on TableRows). I'm working
on a PR that will extract schema information and allow BigQueryIO to use
SchemaCoder instead of TableRowJsonCoder, however this is not quite ready
to be merged yet.

Reuven

On Wed, Jun 13, 2018 at 1:54 AM Etienne Chauchot 
wrote:

> Hi all,
>
> While playing with BigQueryIO I noticed something.
>
> When we create a TableRow (e.g. in a row function in bigQueryIO) using new
> TableRow().set(), for ex a long gets boxed into a Long. But when it is
> encoded using TableRowJsonCoder and then re-read it might be decoded as an
> Integer if the value fits into Integer. It causes failure in asserts in
> tests like write then read.
> What I did for now is to downcast long to int to force it to be boxed into
> an Integer (because test value fits into Integer) at TableRow creation.
>
> Is there a way to fix it in TableRowJsonCoder or a better workaround?
>
> Etienne
>


Re: Precommits broken?

2018-06-14 Thread Andrew Pilloud
If you always run one job for automated and another job for manual you
wouldn't need to remember two trigger phrases. The automated jobs don't
even need trigger phrases. As long as the status contexts are the same
github users never have to know they are two separate jobs.

Andrew

On Thu, Jun 14, 2018 at 9:49 AM Lukasz Cwik  wrote:

> I thought of that as well but would find it annoying that I would need to
> remember two sets of triggers, the ones for the automated jobs and the ones
> for the manual runs. If we re-use the same precommit trigger phrase, we
> would get two runs (automated and manual) of effectively the same thing for
> the jobs where the automated one wouldn't get filtered out.
>
> On Thu, Jun 14, 2018 at 9:46 AM Andrew Pilloud 
> wrote:
>
>> Might there be a third option of creating a different jenkins job for PR
>> change and manual triggers? It would clutter up the jenkins interface a
>> bit, but they could both post status to the same commitStatusContext on
>> Github, so no one would notice there.
>>
>> Andrew
>>
>> On Wed, Jun 13, 2018 at 11:14 PM Jason Kuster 
>> wrote:
>>
>>> Having submitted a patch to the ghprb-plugin repo before, I think that
>>> regretfully option (b) is probably the right decision here given that it's
>>> unlikely to get accepted, merged, released, and to have Infra update the
>>> plugin in under a week.
>>>
>>> On Wed, Jun 13, 2018 at 10:42 PM Scott Wegner 
>>> wrote:
>>>
 Indeed, I was going to send out an email about pre-commit filtering,
 but we've already found some kinks and may need to revert it.

 The change was submitted in PR#5611 [1] and enables Jenkins triggering
 to only run pre-commits based on modified files. However, Udi noticed that
 this also prevents manually running pre-commits on a PR with trigger
 phrases when your PR changes don't match the pre-commit include path [2].
 This was blocking 2.5.0 release validation, so I have a PR out to revert
 the change [3].

 I did some investigation and this is a deficiency in the Jenkins plugin
 used to trigger jobs on pull requests. I've filed a bug [4] and submitted a
 PR [5], but there's no guarantee that it'll get accepted or when it will be
 available.

 Question for others: we were hoping to enable pre-commit triggering as
 an optimization to decrease testing wait time and limit the impact of test
 flakiness [6]. But this bug in the plugin means we'd lose the ability to
 manually trigger pre-commits which aren't automatically run. One workaround
 would be to run the tests locally instead of on Jenkins, though that's
 clearly less desirable. Is this a blocker?

 Should we:
 (a) Keep pre-commit triggering enabled for now and hope the upstream
 patch gets accepted, or
 (b) Revert the pre-commit change and wait for the patch

 Thoughts?

 [1] https://github.com/apache/beam/pull/5611
 [2] https://github.com/apache/beam/pull/5607#issuecomment-397080770
 [3] https://github.com/apache/beam/pull/5638
 [4] https://github.com/jenkinsci/ghprb-plugin/issues/678
 [5] https://github.com/jenkinsci/ghprb-plugin/pull/680
 [6]
 https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#bookmark=id.6j8bwxnbp7fr


 On Wed, Jun 13, 2018 at 10:03 PM Rui Wang  wrote:

> Precommit filter is a really coool optimization!
>
> -Rui
>
> On Wed, Jun 13, 2018 at 5:21 PM Andrew Pilloud 
> wrote:
>
>> Ah, so this is intended and I didn't break anything? Cool! Sorry for
>> the false alarm, looks like a great build optimization!
>>
>> Andrew
>>
>> On Wed, Jun 13, 2018 at 5:06 PM Yifan Zou 
>> wrote:
>>
>>> Probably due to the precommit filter applied in #5611
>>> ?
>>>
>>> On Wed, Jun 13, 2018 at 5:02 PM Andrew Pilloud 
>>> wrote:
>>>
 Looks like statuses got posted between me writing this email and
 sending it. Still wondering why the python and go jobs appear to be 
 missing?

 Andrew

 On Wed, Jun 13, 2018 at 5:00 PM Andrew Pilloud 
 wrote:

> Recent PRs don't appear to be running all the precommits, and
> success status isn't being pushed to PRs. Anyone know what is going 
> on?
>
> See:
> https://github.com/apache/beam/pull/5592
> https://github.com/apache/beam/pull/5622
>
> Andrew
>
>
>>>
>>> --
>>> ---
>>> Jason Kuster
>>> Apache Beam / Google Cloud Dataflow
>>>
>>> See something? Say something. go/jasonkuster-feedback
>>> 
>>>
>>


Re: Precommits broken?

2018-06-14 Thread Lukasz Cwik
I thought of that as well but would find it annoying that I would need to
remember two sets of triggers, the ones for the automated jobs and the ones
for the manual runs. If we re-use the same precommit trigger phrase, we
would get two runs (automated and manual) of effectively the same thing for
the jobs where the automated one wouldn't get filtered out.

On Thu, Jun 14, 2018 at 9:46 AM Andrew Pilloud  wrote:

> Might there be a third option of creating a different jenkins job for PR
> change and manual triggers? It would clutter up the jenkins interface a
> bit, but they could both post status to the same commitStatusContext on
> Github, so no one would notice there.
>
> Andrew
>
> On Wed, Jun 13, 2018 at 11:14 PM Jason Kuster 
> wrote:
>
>> Having submitted a patch to the ghprb-plugin repo before, I think that
>> regretfully option (b) is probably the right decision here given that it's
>> unlikely to get accepted, merged, released, and to have Infra update the
>> plugin in under a week.
>>
>> On Wed, Jun 13, 2018 at 10:42 PM Scott Wegner  wrote:
>>
>>> Indeed, I was going to send out an email about pre-commit filtering, but
>>> we've already found some kinks and may need to revert it.
>>>
>>> The change was submitted in PR#5611 [1] and enables Jenkins triggering
>>> to only run pre-commits based on modified files. However, Udi noticed that
>>> this also prevents manually running pre-commits on a PR with trigger
>>> phrases when your PR changes don't match the pre-commit include path [2].
>>> This was blocking 2.5.0 release validation, so I have a PR out to revert
>>> the change [3].
>>>
>>> I did some investigation and this is a deficiency in the Jenkins plugin
>>> used to trigger jobs on pull requests. I've filed a bug [4] and submitted a
>>> PR [5], but there's no guarantee that it'll get accepted or when it will be
>>> available.
>>>
>>> Question for others: we were hoping to enable pre-commit triggering as
>>> an optimization to decrease testing wait time and limit the impact of test
>>> flakiness [6]. But this bug in the plugin means we'd lose the ability to
>>> manually trigger pre-commits which aren't automatically run. One workaround
>>> would be to run the tests locally instead of on Jenkins, though that's
>>> clearly less desirable. Is this a blocker?
>>>
>>> Should we:
>>> (a) Keep pre-commit triggering enabled for now and hope the upstream
>>> patch gets accepted, or
>>> (b) Revert the pre-commit change and wait for the patch
>>>
>>> Thoughts?
>>>
>>> [1] https://github.com/apache/beam/pull/5611
>>> [2] https://github.com/apache/beam/pull/5607#issuecomment-397080770
>>> [3] https://github.com/apache/beam/pull/5638
>>> [4] https://github.com/jenkinsci/ghprb-plugin/issues/678
>>> [5] https://github.com/jenkinsci/ghprb-plugin/pull/680
>>> [6]
>>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#bookmark=id.6j8bwxnbp7fr
>>>
>>>
>>> On Wed, Jun 13, 2018 at 10:03 PM Rui Wang  wrote:
>>>
 Precommit filter is a really coool optimization!

 -Rui

 On Wed, Jun 13, 2018 at 5:21 PM Andrew Pilloud 
 wrote:

> Ah, so this is intended and I didn't break anything? Cool! Sorry for
> the false alarm, looks like a great build optimization!
>
> Andrew
>
> On Wed, Jun 13, 2018 at 5:06 PM Yifan Zou  wrote:
>
>> Probably due to the precommit filter applied in #5611
>> ?
>>
>> On Wed, Jun 13, 2018 at 5:02 PM Andrew Pilloud 
>> wrote:
>>
>>> Looks like statuses got posted between me writing this email and
>>> sending it. Still wondering why the python and go jobs appear to be 
>>> missing?
>>>
>>> Andrew
>>>
>>> On Wed, Jun 13, 2018 at 5:00 PM Andrew Pilloud 
>>> wrote:
>>>
 Recent PRs don't appear to be running all the precommits, and
 success status isn't being pushed to PRs. Anyone know what is going on?

 See:
 https://github.com/apache/beam/pull/5592
 https://github.com/apache/beam/pull/5622

 Andrew


>>
>> --
>> ---
>> Jason Kuster
>> Apache Beam / Google Cloud Dataflow
>>
>> See something? Say something. go/jasonkuster-feedback
>> 
>>
>


Re: Precommits broken?

2018-06-14 Thread Andrew Pilloud
Might there be a third option of creating a different jenkins job for PR
change and manual triggers? It would clutter up the jenkins interface a
bit, but they could both post status to the same commitStatusContext on
Github, so no one would notice there.

Andrew

On Wed, Jun 13, 2018 at 11:14 PM Jason Kuster 
wrote:

> Having submitted a patch to the ghprb-plugin repo before, I think that
> regretfully option (b) is probably the right decision here given that it's
> unlikely to get accepted, merged, released, and to have Infra update the
> plugin in under a week.
>
> On Wed, Jun 13, 2018 at 10:42 PM Scott Wegner  wrote:
>
>> Indeed, I was going to send out an email about pre-commit filtering, but
>> we've already found some kinks and may need to revert it.
>>
>> The change was submitted in PR#5611 [1] and enables Jenkins triggering to
>> only run pre-commits based on modified files. However, Udi noticed that
>> this also prevents manually running pre-commits on a PR with trigger
>> phrases when your PR changes don't match the pre-commit include path [2].
>> This was blocking 2.5.0 release validation, so I have a PR out to revert
>> the change [3].
>>
>> I did some investigation and this is a deficiency in the Jenkins plugin
>> used to trigger jobs on pull requests. I've filed a bug [4] and submitted a
>> PR [5], but there's no guarantee that it'll get accepted or when it will be
>> available.
>>
>> Question for others: we were hoping to enable pre-commit triggering as an
>> optimization to decrease testing wait time and limit the impact of test
>> flakiness [6]. But this bug in the plugin means we'd lose the ability to
>> manually trigger pre-commits which aren't automatically run. One workaround
>> would be to run the tests locally instead of on Jenkins, though that's
>> clearly less desirable. Is this a blocker?
>>
>> Should we:
>> (a) Keep pre-commit triggering enabled for now and hope the upstream
>> patch gets accepted, or
>> (b) Revert the pre-commit change and wait for the patch
>>
>> Thoughts?
>>
>> [1] https://github.com/apache/beam/pull/5611
>> [2] https://github.com/apache/beam/pull/5607#issuecomment-397080770
>> [3] https://github.com/apache/beam/pull/5638
>> [4] https://github.com/jenkinsci/ghprb-plugin/issues/678
>> [5] https://github.com/jenkinsci/ghprb-plugin/pull/680
>> [6]
>> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#bookmark=id.6j8bwxnbp7fr
>>
>>
>> On Wed, Jun 13, 2018 at 10:03 PM Rui Wang  wrote:
>>
>>> Precommit filter is a really coool optimization!
>>>
>>> -Rui
>>>
>>> On Wed, Jun 13, 2018 at 5:21 PM Andrew Pilloud 
>>> wrote:
>>>
 Ah, so this is intended and I didn't break anything? Cool! Sorry for
 the false alarm, looks like a great build optimization!

 Andrew

 On Wed, Jun 13, 2018 at 5:06 PM Yifan Zou  wrote:

> Probably due to the precommit filter applied in #5611
> ?
>
> On Wed, Jun 13, 2018 at 5:02 PM Andrew Pilloud 
> wrote:
>
>> Looks like statuses got posted between me writing this email and
>> sending it. Still wondering why the python and go jobs appear to be 
>> missing?
>>
>> Andrew
>>
>> On Wed, Jun 13, 2018 at 5:00 PM Andrew Pilloud 
>> wrote:
>>
>>> Recent PRs don't appear to be running all the precommits, and
>>> success status isn't being pushed to PRs. Anyone know what is going on?
>>>
>>> See:
>>> https://github.com/apache/beam/pull/5592
>>> https://github.com/apache/beam/pull/5622
>>>
>>> Andrew
>>>
>>>
>
> --
> ---
> Jason Kuster
> Apache Beam / Google Cloud Dataflow
>
> See something? Say something. go/jasonkuster-feedback
> 
>


Re: Building and visualizing the Beam SQL graph

2018-06-14 Thread Reuven Lax
There was a previous discussion about having generic attributes on
PCollection. Maybe this is a good driving use case?

On Wed, Jun 13, 2018 at 4:36 PM Kenneth Knowles  wrote:

> Another thing to consider is that we might return something like a
> "SqlPCollection" that is the PCollection plus additional metadata that
> is useful to the shell / enumerable converter (such as if the PCollection
> has a known finite size due to LIMIT, even if it is "unbounded", and the
> shell can return control to the user once it receives enough rows). After
> your proposed change this will be much more natural to do, so that's
> another point in favor of the refactor.
>
> Kenn
>
> On Wed, Jun 13, 2018 at 10:22 AM Andrew Pilloud 
> wrote:
>
>> One of my goals is to make the graph easier to read and map back to the
>> SQL EXPLAIN output. The way the graph is currently built (`toPTransform` vs
>> `toPCollection`) does make a big difference in that graph. I think it is
>> also important to have a common function to do the apply with consistent
>> naming. I think that will greatly help with ease of understanding. It
>> sounds like what really want is this in the BeamRelNode interface:
>>
>> PInput buildPInput(Pipeline pipeline);
>> PTransform> buildPTransform();
>>
>> default PCollection toPCollection(Pipeline pipeline) {
>> return buildPInput(pipeline).apply(getStageName(), buildPTransform());
>> }
>>
>> Andrew
>>
>> On Mon, Jun 11, 2018 at 2:27 PM Mingmin Xu  wrote:
>>
>>> EXPLAIN shows the execution plan in SQL perspective only. After
>>> converting to a Beam composite PTransform, there're more steps underneath,
>>> each Runner re-org Beam PTransforms again which makes the final pipeline
>>> hard to read. In SQL module itself, I don't see any difference between
>>> `toPTransform` and `toPCollection`. We could have an easy-to-understand
>>> step name when converting RelNodes, but Runners show the graph to
>>> developers.
>>>
>>> Mingmin
>>>
>>> On Mon, Jun 11, 2018 at 2:06 PM, Andrew Pilloud 
>>> wrote:
>>>
 That sounds correct. And because each rel node might have a different
 input there isn't a standard interface (like PTransform<
 PCollection, PCollection> toPTransform());

 Andrew

 On Mon, Jun 11, 2018 at 1:31 PM Kenneth Knowles  wrote:

> Agree with that. It will be kind of tricky to generalize. I think
> there are some criteria in this case that might apply in other cases:
>
> 1. Each rel node (or construct of a DSL) should have a PTransform for
> how it computes its result from its inputs.
> 2. The inputs to that PTransform should actually be the inputs to the
> rel node!
>
> So I tried to improve #1 but I probably made #2 worse.
>
> Kenn
>
> On Mon, Jun 11, 2018 at 12:53 PM Anton Kedin  wrote:
>
>> Not answering the original question, but doesn't "explain" satisfy
>> the SQL use case?
>>
>> Going forward we probably want to solve this in a more general way.
>> We have at least 3 ways to represent the pipeline:
>>  - how runner executes it;
>>  - what it looks like when constructed;
>>  - what the user was describing in DSL;
>> And there will probably be more, if extra layers are built on top of
>> DSLs.
>>
>> If possible, we probably should be able to map any level of
>> abstraction to any other to better understand and debug the pipelines.
>>
>>
>> On Mon, Jun 11, 2018 at 12:17 PM Kenneth Knowles 
>> wrote:
>>
>>> In other words, revert
>>> https://github.com/apache/beam/pull/4705/files, at least in spirit?
>>> I agree :-)
>>>
>>> Kenn
>>>
>>> On Mon, Jun 11, 2018 at 11:39 AM Andrew Pilloud 
>>> wrote:
>>>
 We are currently converting the Calcite Rel tree to Beam by
 recursively building a tree of nested PTransforms. This results in a 
 weird
 nested graph in the dataflow UI where each node contains its inputs 
 nested
 inside of it. I'm going to change the internal data structure for
 converting the tree from a PTransform to a PCollection, which will 
 result
 in a more accurate representation of the tree structure being built and
 should simplify the code as well. This will not change the public 
 interface
 to SQL, which will remain a PTransform. Any thoughts or objections?

 I was also wondering if there are tools for visualizing the Beam
 graph aside from the dataflow runner UI. What other tools exist?

 Andrew

>>>
>>>
>>>
>>> --
>>> 
>>> Mingmin
>>>
>>


Build failed in Jenkins: beam_Release_Gradle_NightlySnapshot #69

2018-06-14 Thread Apache Jenkins Server
See 


Changes:

[lcwik] [BEAM-4540] Migrate junit/hamcrest to provided scope.

[swegner] Add support for pre-commit trigger paths

[swegner] Set path triggers for existing pre-commit test jobs

[robinyq] Defer calling formatTimestamp() to achieve better performance

[ccy] [BEAM-4549] Use per-pipeline unique ids for side inputs in

[robbe.sneyders] Futurize utils subpackage

[iemejia] [BEAM-4551] Update spark runner to Spark version 2.3.1

[iemejia] Fix maven build error on sdks/java/io/google-cloud-platform module

[robinyq] Throws exception directly instead of calling checkState()

[github] Remove GPL findbugs dependency (#5609)

[aaltay] Futurize portability subpackage (#5385)

[altay] Futurize unpackaged files

[altay] resolved six.string_types equivalency

[altay] Futurize testing subpackage

[altay] Futurize tools subpackage

[altay] Remove old_div

[swegner] Revert "Merge pull request #5611: [BEAM-4445] Filter pre-commit

[robertwb] [BEAM-4546] Implement hot key fanout for combiners.

[robertwb] Use discaring mode for first level of combine.

--
[...truncated 1.95 MB...]
:107:
 warning: no @return
  public TopicPath createOrReuseTopic(String shortTopic) throws IOException {
   ^
:107:
 warning: no @throws for java.io.IOException
  public TopicPath createOrReuseTopic(String shortTopic) throws IOException {
   ^
:135:
 warning: no @param for shortTopic
  public TopicPath reuseTopic(String shortTopic) throws IOException {
   ^
:135:
 warning: no @return
  public TopicPath reuseTopic(String shortTopic) throws IOException {
   ^
:135:
 warning: no @throws for java.io.IOException
  public TopicPath reuseTopic(String shortTopic) throws IOException {
   ^
:145:
 warning: no @param for shortTopic
  public boolean topicExists(String shortTopic) throws IOException {
 ^
:145:
 warning: no @return
  public boolean topicExists(String shortTopic) throws IOException {
 ^
:145:
 warning: no @throws for java.io.IOException
  public boolean topicExists(String shortTopic) throws IOException {
 ^
:157:
 warning: no @param for shortTopic
  public SubscriptionPath createSubscription(String shortTopic, String 
shortSubscription)
  ^
:157:
 warning: no @param for shortSubscription
  public SubscriptionPath createSubscription(String shortTopic, String 
shortSubscription)
  ^
:157:
 warning: no @return
  public SubscriptionPath createSubscription(String shortTopic, String 
shortSubscription)
  ^
:157:
 warning: no @throws for java.io.IOException
  public SubscriptionPath createSubscription(String shortTopic, String 
shortSubscription)
  ^
:187:
 warning: no @param for shortTopic
  public SubscriptionPath reuseSubscription(String 

Re: Beam Dependency Check Report (2018-06-13)

2018-06-14 Thread Paul Gerver
I do have one request to be added to the Java SDK version updates:
Beam-3831 [1]. The Google Core depends on the old org.json package which
ASF discourages using because of the "Use only for good, not evil" clause.

[1] https://issues.apache.org/jira/browse/BEAM-3831

On Thu, Jun 14, 2018 at 3:03 AM Etienne Chauchot 
wrote:

> Thanks Yifan,
>
> This is great ! It would help us maintain Beam more easily and probably
> help us fixing CVE as well.
>
> Etienne
>
> Le mercredi 13 juin 2018 à 07:45 -0700, Yifan Zou a écrit :
>
> Hi,
>
>
> I want to follow up and explain this email.
>
>
> This is a sample email that reports the results of Beam SDK dependency
> check, which was proposed here
> .
> The goal is finding updates for all Beam Python & Java SDKs' dependencies
> and prioritize them. The job will be auto triggered in Jenkins once a week
> and generate a report. The report lists the high priority updates base on
> the following criteria:
>
>
> The dependency update is high priority if:
>
> 1. It has major versions update available;
>
>   e.g. org.assertj:assertj-core 2.5.0 -> 3.10.0
>
>  2. or, it is over 3 minor versions behind the latest version;
>
>   e.g. org.tukaani:xz 1.5 -> 1.8
>
> 3. or, the current version is behind the later version for over 180 days.
>
>
>   e.g. com.google.auto.service:auto-service 2014-10-24 -> 2017-12-11
>
>
> This job helps Beam contributors to determine the dependency which is far
> behind the latest released version. The next step would be automating
> filing JIRA bugs for dep updates, group dependencies and identify owners to
> take care of the upgrades follow Chamikara's proposal
> 
> .
>
>
> For more readings:
>
> [Proposal] Beam dependency check automation
> 
>  by Yifan Zou
>
> [Proposal] Beam dependency update policy
> 
>  by *Chamikara Jayalath*
>
> Thank you.
>
> Yifan Zou
>
> On Wed, Jun 13, 2018 at 7:41 AM Apache Jenkins Server <
> jenk...@builds.apache.org> wrote:
>
> High Priority Dependency Updates Of Beam Python SDK:
> *Dependency Name* *Current Version* *Later Version* *Current Version
> Release Date* *Later Version Release Date*
> google-cloud-bigquery 0.25.0 1.3.0 2017-06-26 2018-06-08
> httplib2 0.9.2 0.11.3 2015-09-28 2018-03-30 High Priority Dependency
> Updates Of Beam Java SDK:
> *Dependency Name* *Current Version* *Later Version* *Current Version
> Release Date* *Later Version Release Date*
> org.assertj:assertj-core 2.5.0 3.10.0 2016-07-03 2018-05-11
> com.google.auto.service:auto-service 1.0-rc2 1.0-rc4 2014-10-24 2017-12-11
> biz.aQute:bndlib 1.43.0 2.0.0.20130123-133441 2011-04-01 2013-02-27
> org.apache.cassandra:cassandra-all 3.9 3.11.2 2016-09-26 2018-02-14
> commons-cli:commons-cli 1.2 1.4 2009-03-19 2017-03-09
> commons-codec:commons-codec 1.9 1.11 2013-12-20 2017-10-17
> org.apache.commons:commons-dbcp2 2.1.1 2.3.0 2015-08-02 2018-05-08
> com.typesafe:config 1.3.0 1.3.3 2015-05-08 2018-02-21
> de.flapdoodle.embed:de.flapdoodle.embed.mongo 1.50.1 2.0.3 2015-12-11
> 2018-02-14
> de.flapdoodle.embed:de.flapdoodle.embed.process 1.50.1 2.0.3 2015-12-11
> 2018-02-14
> org.apache.derby:derby 10.12.1.1 10.14.2.0 2015-10-10 2018-05-03
> org.apache.derby:derbyclient 10.12.1.1 10.14.2.0 2015-10-10 2018-05-03
> org.apache.derby:derbynet 10.12.1.1 10.14.2.0 2015-10-10 2018-05-03
> org.elasticsearch:elasticsearch 5.6.3 6.2.4 2017-10-06 2018-04-12
> org.elasticsearch:elasticsearch-hadoop 5.0.0 6.2.4 2016-10-26 2018-04-12
> org.elasticsearch.client:elasticsearch-rest-client 5.6.3 6.2.4 2017-10-06
> 2018-04-12
> com.alibaba:fastjson 1.2.12 1.2.47 2016-05-21 2018-03-15
> org.elasticsearch.test:framework 5.6.3 6.2.4 2017-10-06 2018-04-12
> org.freemarker:freemarker 2.3.25-incubating 2.3.28 2016-06-14 2018-03-30
> org.codehaus.groovy:groovy-all 2.4.13 3.0.0-alpha-2 2017-11-22 2018-04-16
> org.apache.hbase:hbase-common 1.2.6 2.0.0.3.0.0.3-2 2017-05-29 2018-05-31
> org.apache.hbase:hbase-hadoop-compat 1.2.6 2.0.0.3.0.0.3-2 2017-05-29
> 2018-05-31
> org.apache.hbase:hbase-hadoop2-compat 1.2.6 2.0.0.3.0.0.3-2 2017-05-29
> 2018-05-31
> org.apache.hbase:hbase-server 1.2.6 2.0.0.3.0.0.3-2 2017-05-29 2018-05-31
> org.apache.hbase:hbase-shaded-client 1.2.6 2.0.0.3.0.0.3-2 2017-05-29
> 2018-05-31
> org.apache.hbase:hbase-shaded-server 1.2.6 2.0.0-alpha2 2017-05-29
> 2018-05-31
> org.apache.hive:hive-cli 2.1.0 3.0.0.3.0.0.3-2 2016-06-16 2018-05-21
> org.apache.hive:hive-common 2.1.0 3.0.0.3.0.0.3-2 2016-06-16 2018-05-21
> org.apache.hive:hive-exec 2.1.0 3.0.0.3.0.0.3-2 2016-06-16 2018-05-21
> org.apache.hive.hcatalog:hive-hcatalog-core 2.1.0 3.0.0.3.0.0.3-2
> 2016-06-16 

Re: [VOTE] Apache Beam, version 2.5.0, release candidate #1

2018-06-14 Thread Jean-Baptiste Onofré
FYI, I'm starting RC2 right now.

Stay tuned !

Regards
JB

On 06/06/2018 10:44, Jean-Baptiste Onofré wrote:
> Hi everyone,
> 
> Please review and vote on the release candidate #1 for the version
> 2.5.0, as follows:
> 
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> NB: this is the first release using Gradle, so don't be too harsh ;) A
> PR about the release guide will follow thanks to this release.
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint C8282E76 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v2.5.0-RC1" [5],
> * website pull request listing the release and publishing the API
> reference manual [6].
> * Java artifacts were built with Gradle 4.7 (wrapper) and OpenJDK/Oracle
> JDK 1.8.0_172 (Oracle Corporation 25.172-b11).
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> 
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> JB
> 
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12342847
> [2] https://dist.apache.org/repos/dist/dev/beam/2.5.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1041/
> [5] https://github.com/apache/beam/tree/v2.5.0-RC1
> [6] https://github.com/apache/beam-site/pull/463
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Apache Beam June Newsletter

2018-06-14 Thread Etienne Chauchot
Thanks Gris, this is very cool !
besides we did not include schedule talks for the ApacheCon (end of September) 
because they 'll take place in a long
time, maybe they'll be announced in the next news letter?
Etienne
Le mercredi 13 juin 2018 à 16:41 -0700, Pablo Estrada a écrit :
> Thanks Gris! Lots of interesting things.Best
> -P.
> 
> On Wed, Jun 13, 2018 at 4:40 PM Griselda Cuevas  wrote:
> > Hi Beam Community! 
> > Here [1] is the June Edition of our Apache Beam Newsletter. This edition 
> > was curated by our community of
> > contributors, committers and PMCs. Generally, it contains the work done in 
> > the previous month (May in this case) and
> > what's planned for the future.
> > 
> > We hope to provide visibility to what's going on in the community, so if 
> > you have questions, feel free to ask in
> > this thread. 
> > 
> > Cheers, 
> > Gris
> > 
> > [1] 
> > https://docs.google.com/document/d/1BwRhOu-uDd3SLB_Om_Beke5RoGKos4hj7Ljh7zM2YIo/edit?ts=5b17fb92#
> > 
> > 
> > 
> > -- 
> > 
> > You received this message because you are subscribed to the Google Groups 
> > "datapls-team" group.
> > 
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to datapls-team+unsubscribe@google.c
> > om.
> > 
> > To post to this group, send email to datapls-t...@google.com.
> > 
> > To view this discussion on the web visit 
> > https://groups.google.com/a/google.com/d/msgid/datapls-team/CAMtXPk6KnivR%3
> > Dea8ObNhTVoacDDAn35_Nrsa52hLzY21SjJPEw%40mail.gmail.com.
> > 
> > 
> > 
> > 

Re: Beam Dependency Check Report (2018-06-13)

2018-06-14 Thread Etienne Chauchot
Thanks Yifan,
This is great ! It would help us maintain Beam more easily and probably help us 
fixing CVE as well.
Etienne 
Le mercredi 13 juin 2018 à 07:45 -0700, Yifan Zou a écrit :
> Hi,
> 
> 
> I want to follow up and explain this email.
> 
> 
> This is a sample email that reports the results of Beam SDK dependency check, 
> which was proposed here. The goal is
> finding updates for all Beam Python & Java SDKs' dependencies and prioritize 
> them. The job will be auto triggered in
> Jenkins once a week and generate a report. The report lists the high priority 
> updates base on the following criteria:
> 
> 
> The dependency update is high priority if:
> 1. It has major versions update available;
>   e.g. org.assertj:assertj-core 2.5.0 -> 3.10.0
>  2. or, it is over 3 minor versions behind the latest version;
>   e.g. org.tukaani:xz 1.5 -> 1.8
> 3. or, the current version is behind the later version for over 180 days.  
>   e.g. com.google.auto.service:auto-service 2014-10-24 -> 2017-12-11
> 
> 
> This job helps Beam contributors to determine the dependency which is far 
> behind the latest released version. The next
> step would be automating filing JIRA bugs for dep updates, group dependencies 
> and identify owners to take care of the
> upgrades follow Chamikara's proposal.
> 
> 
> For more readings:
> [Proposal] Beam dependency check automation by Yifan Zou
> [Proposal] Beam dependency update policy by Chamikara Jayalath
> 
> Thank you.
> 
> Yifan Zou
> On Wed, Jun 13, 2018 at 7:41 AM Apache Jenkins Server 
>  wrote:
> > High Priority Dependency Updates Of Beam Python SDK:
> > 
> > 
> > Dependency Name
> > Current Version
> > Later Version
> > Current Version Release Date
> > Later Version Release Date
> > google-cloud-bigquery0.25.01.3.02017-06-262018-06-08
> > httplib20.9.20.11.32015-09-282018-03-30
> > 
> > High Priority Dependency Updates Of Beam Java SDK:
> > 
> > 
> > Dependency Name
> > Current Version
> > Later Version
> > Current Version Release Date
> > Later Version Release Date
> > org.assertj:assertj-core2.5.03.10.02016-07-032018-05-11
> > com.google.auto.service:auto-service1.0-rc21.0-rc42014-10-242017-12-11
> > biz.aQute:bndlib1.43.02.0.0.20130123-1334412011-04-012013-02-27
> > org.apache.cassandra:cassandra-all3.93.11.22016-09-262018-02-14
> > commons-cli:commons-cli1.21.42009-03-192017-03-09
> > commons-codec:commons-codec1.91.112013-12-202017-10-17
> > org.apache.commons:commons-dbcp22.1.12.3.02015-08-022018-05-08
> > com.typesafe:config1.3.01.3.32015-05-082018-02-21
> > de.flapdoodle.embed:de.flapdoodle.embed.mongo1.50.12.0.32015-12-112018-02-14
> > de.flapdoodle.embed:de.flapdoodle.embed.process1.50.12.0.32015-12-112018-02-14
> > org.apache.derby:derby10.12.1.110.14.2.02015-10-102018-05-03
> > org.apache.derby:derbyclient10.12.1.110.14.2.02015-10-102018-05-03
> > org.apache.derby:derbynet10.12.1.110.14.2.02015-10-102018-05-03
> > org.elasticsearch:elasticsearch5.6.36.2.42017-10-062018-04-12
> > org.elasticsearch:elasticsearch-hadoop5.0.06.2.42016-10-262018-04-12
> > org.elasticsearch.client:elasticsearch-rest-client5.6.36.2.42017-10-062018-04-12
> > com.alibaba:fastjson1.2.121.2.472016-05-212018-03-15
> > org.elasticsearch.test:framework5.6.36.2.42017-10-062018-04-12
> > org.freemarker:freemarker2.3.25-incubating2.3.282016-06-142018-03-30
> > org.codehaus.groovy:groovy-all2.4.133.0.0-alpha-22017-11-222018-04-16
> > org.apache.hbase:hbase-common1.2.62.0.0.3.0.0.3-22017-05-292018-05-31
> > org.apache.hbase:hbase-hadoop-compat1.2.62.0.0.3.0.0.3-22017-05-292018-05-31
> > org.apache.hbase:hbase-hadoop2-compat1.2.62.0.0.3.0.0.3-22017-05-292018-05-31
> > org.apache.hbase:hbase-server1.2.62.0.0.3.0.0.3-22017-05-292018-05-31
> > org.apache.hbase:hbase-shaded-client1.2.62.0.0.3.0.0.3-22017-05-292018-05-31
> > org.apache.hbase:hbase-shaded-server1.2.62.0.0-alpha22017-05-292018-05-31
> > org.apache.hive:hive-cli2.1.03.0.0.3.0.0.3-22016-06-162018-05-21
> > org.apache.hive:hive-common2.1.03.0.0.3.0.0.3-22016-06-162018-05-21
> > org.apache.hive:hive-exec2.1.03.0.0.3.0.0.3-22016-06-162018-05-21
> > org.apache.hive.hcatalog:hive-hcatalog-core2.1.03.0.0.3.0.0.3-22016-06-162018-05-21
> > org.apache.httpcomponents:httpasyncclient4.1.24.1.32016-06-182017-02-05
> > org.apache.httpcomponents:httpclient4.5.24.5.52016-02-212018-01-18
> > org.apache.httpcomponents:httpcore4.4.54.4.92016-06-082018-01-11
> > net.java.dev.javacc:javacc4.07.0.32018-06-082017-11-06
> > jline:jline2.14.63.0.0.M12018-03-262018-06-08
> > net.java.dev.jna:jna4.1.04.5.12014-03-062017-12-27
> > com.esotericsoftware.kryo:kryo2.212.24.02013-02-272014-05-04
> > io.dropwizard.metrics:metrics-core3.1.24.1.0-rc22015-04-252018-05-03
> > org.mongodb:mongo-java-driver3.2.23.8.0-beta32016-02-152018-05-29
> > io.netty:netty-all4.1.17.Final5.0.0.Alpha22017-11-082018-06-06
> > io.grpc:protoc-gen-grpc-java1.2.01.12.02017-03-152018-05-07
> > 

Re: Precommits broken?

2018-06-14 Thread Jason Kuster
Having submitted a patch to the ghprb-plugin repo before, I think that
regretfully option (b) is probably the right decision here given that it's
unlikely to get accepted, merged, released, and to have Infra update the
plugin in under a week.

On Wed, Jun 13, 2018 at 10:42 PM Scott Wegner  wrote:

> Indeed, I was going to send out an email about pre-commit filtering, but
> we've already found some kinks and may need to revert it.
>
> The change was submitted in PR#5611 [1] and enables Jenkins triggering to
> only run pre-commits based on modified files. However, Udi noticed that
> this also prevents manually running pre-commits on a PR with trigger
> phrases when your PR changes don't match the pre-commit include path [2].
> This was blocking 2.5.0 release validation, so I have a PR out to revert
> the change [3].
>
> I did some investigation and this is a deficiency in the Jenkins plugin
> used to trigger jobs on pull requests. I've filed a bug [4] and submitted a
> PR [5], but there's no guarantee that it'll get accepted or when it will be
> available.
>
> Question for others: we were hoping to enable pre-commit triggering as an
> optimization to decrease testing wait time and limit the impact of test
> flakiness [6]. But this bug in the plugin means we'd lose the ability to
> manually trigger pre-commits which aren't automatically run. One workaround
> would be to run the tests locally instead of on Jenkins, though that's
> clearly less desirable. Is this a blocker?
>
> Should we:
> (a) Keep pre-commit triggering enabled for now and hope the upstream patch
> gets accepted, or
> (b) Revert the pre-commit change and wait for the patch
>
> Thoughts?
>
> [1] https://github.com/apache/beam/pull/5611
> [2] https://github.com/apache/beam/pull/5607#issuecomment-397080770
> [3] https://github.com/apache/beam/pull/5638
> [4] https://github.com/jenkinsci/ghprb-plugin/issues/678
> [5] https://github.com/jenkinsci/ghprb-plugin/pull/680
> [6]
> https://docs.google.com/document/d/1lfbMhdIyDzIaBTgc9OUByhSwR94kfOzS_ozwKWTVl5U/edit#bookmark=id.6j8bwxnbp7fr
>
>
> On Wed, Jun 13, 2018 at 10:03 PM Rui Wang  wrote:
>
>> Precommit filter is a really coool optimization!
>>
>> -Rui
>>
>> On Wed, Jun 13, 2018 at 5:21 PM Andrew Pilloud 
>> wrote:
>>
>>> Ah, so this is intended and I didn't break anything? Cool! Sorry for the
>>> false alarm, looks like a great build optimization!
>>>
>>> Andrew
>>>
>>> On Wed, Jun 13, 2018 at 5:06 PM Yifan Zou  wrote:
>>>
 Probably due to the precommit filter applied in #5611
 ?

 On Wed, Jun 13, 2018 at 5:02 PM Andrew Pilloud 
 wrote:

> Looks like statuses got posted between me writing this email and
> sending it. Still wondering why the python and go jobs appear to be 
> missing?
>
> Andrew
>
> On Wed, Jun 13, 2018 at 5:00 PM Andrew Pilloud 
> wrote:
>
>> Recent PRs don't appear to be running all the precommits, and success
>> status isn't being pushed to PRs. Anyone know what is going on?
>>
>> See:
>> https://github.com/apache/beam/pull/5592
>> https://github.com/apache/beam/pull/5622
>>
>> Andrew
>>
>>

-- 
---
Jason Kuster
Apache Beam / Google Cloud Dataflow

See something? Say something. go/jasonkuster-feedback