Re: CODEOWNERS for apache/beam repo

2018-07-25 Thread Udi Meiri
So I configured Prow using their getting started guide (and found a bug in
it) on a test repo.

TLDR: Prow can work for us as a review assignment tool if all potential
reviewers are also added to the https://github.com/apache org.

Some findings:
1. Github doesn't allow non-collaborators to be listed as reviewers. :(
But! anyone added to the Apache org on Github may be added as a reviewer.
(no write access needed)
Is this something the ASF is willing to consider?

2. Prow works pretty well. I've configured it to just assign code reviewers.
Here's an example of it in action:
https://github.com/udim-org/prow-test/pull/6
Essentially, the command we would use are:
"/cc @user" - to explicitly add a reviewer (/uncc to remove)

Other command in the example above are not necessary.
We can still use our current PR approval and merge process.

3. Prow currently tries to assign 2 code reviewers, and hopefully that's
configurable.

Still unsure:
1. How does Prow select reviewers? Does it load balance?

On Mon, Jul 23, 2018 at 9:51 PM Jean-Baptiste Onofré 
wrote:

> It looks interesting but I would like to see the complete video and
> explanation about prow. Especially what we concretely need.
>
> Regards
> JB
> Le 24 juil. 2018, à 04:17, Udi Meiri  a écrit:
>>
>> I was recently told about Prow
>> , which
>> automates testing and merging for Kubernetes and other projects.
>> It also automates assigning reviewers and suggesting approvers. Example
>>  PR, video explanation
>> 
>> I propose trying out Prow, since is it's a maintained and it uses OWNERS
>> files to explicitly define both who should be reviewing and who should
>> approve a PR.
>>
>> I'm not suggesting we use it to replace Jenkins or do our merges for us.
>>
>>
>> On Tue, Jul 17, 2018 at 11:04 AM Udi Meiri  wrote:
>>
>>> +1 to generating the file.
>>> I'll go ahead and file a PR to remove CODEOWNERS
>>>
>>> On Tue, Jul 17, 2018 at 9:28 AM Holden Karau 
>>> wrote:
>>>
 So it doesn’t support doing that right now, although if we find it’s a
 problem we can specify an exclude file with folks who haven’t contributed
 in the past year. Would people want me to generate that first?

 On Tue, Jul 17, 2018 at 10:22 AM Ismaël Mejía 
 wrote:

> Is there a way to put inactive people as not reviewers for the blame
> case? I think it can be useful considering that a good amount of our
> committers are not active at the moment and auto-assigning reviews to
> them seem like a waste of energy/time.
> On Tue, Jul 17, 2018 at 1:58 AM Eugene Kirpichov 
> wrote:
> >
> > We did not, but I think we should. So far, in 100% of the PRs I've
> authored, the default functionality of CODEOWNERS did the wrong thing and 
> I
> had to fix something up manually.
> >
> > On Mon, Jul 16, 2018 at 3:42 PM Andrew Pilloud 
> wrote:
> >>
> >> This sounds like a good plan. Did we want to rename the CODEOWNERS
> file to disable github's mass adding of reviewers while we figure this 
> out?
> >>
> >> Andrew
> >>
> >> On Mon, Jul 16, 2018 at 10:20 AM Jean-Baptiste Onofré <
> j...@nanthrax.net> wrote:
> >>>
> >>> +1
> >>>
> >>> Le 16 juil. 2018, à 19:17, Holden Karau 
> a écrit:
> 
>  Ok if no one objects I'll create the INFRA ticket after OSCON and
> we can test it for a week and decide if it helps or hinders.
> 
>  On Mon, Jul 16, 2018, 7:12 PM Jean-Baptiste Onofré <
> j...@nanthrax.net> wrote:
> >
> > Agree to test it for a week.
> >
> > Regards
> > JB
> > Le 16 juil. 2018, à 18:59, Holden Karau < holden.ka...@gmail.com>
> a écrit:
> >>
> >> Would folks be OK with me asking infra to turn on blame based
> suggestions for Beam and trying it out for a week?
> >>
> >> On Mon, Jul 16, 2018, 6:53 PM Rafael Fernandez <
> rfern...@google.com> wrote:
> >>>
> >>> +1 using blame -- nifty :)
> >>>
> >>> On Mon, Jul 16, 2018 at 2:31 AM Huygaa Batsaikhan <
> bat...@google.com> wrote:
> 
>  +1. This is great.
> 
>  On Sat, Jul 14, 2018 at 7:44 AM Udi Meiri < eh...@google.com>
> wrote:
> >
> > Mention bot looks cool, as it tries to guess the reviewer
> using blame.
> > I've written a quick and dirty script that uses only
> CODEOWNERS.
> >
> > Its output looks like:
> > $ python suggest_reviewers.py --pr 5940
> > INFO:root:Selected reviewer @lukecwik for:
> /runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/PTransformMatchers.java
> (path_pattern: /runners/core-construction-java*)

Beam Dependency Check Report (2018-07-25)

2018-07-25 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  
com.amazonaws:amazon-kinesis-client
1.8.8
1.9.1
None
2018-07-25


com.amazonaws:amazon-kinesis-producer
0.12.8
0.12.9
None
2018-07-25


org.assertj:assertj-core
2.5.0
3.10.0
None
2018-07-25


com.google.auto.service:auto-service
1.0-rc2
1.0-rc4
None
2018-07-25


com.google.auto.value:auto-value
1.5.3
1.6.2
None
2018-07-25


org.apache.calcite.avatica:avatica-core
1.11.0
1.12.0
None
2018-07-25


com.amazonaws:aws-java-sdk-cloudwatch
1.11.255
1.11.373
None
2018-07-25


com.amazonaws:aws-java-sdk-core
1.11.319
1.11.373
None
2018-07-25


com.amazonaws:aws-java-sdk-kinesis
1.11.255
1.11.373
None
2018-07-25


com.amazonaws:aws-java-sdk-s3
1.11.319
1.11.373
None
2018-07-25


biz.aQute:bndlib
1.43.0
2.0.0.20130123-133441
None
2018-07-25


com.gradle:build-scan-plugin
1.13.1
1.15.1
None
2018-07-25


org.apache.calcite:calcite-core
1.16.0
1.17.0
None
2018-07-25


org.apache.calcite:calcite-linq4j
1.16.0
1.17.0
None
2018-07-25


org.apache.cassandra:cassandra-all
3.9
3.11.2
None
2018-07-25


com.datastax.cassandra:cassandra-driver-core
3.5.0
3.5.1
None
2018-07-25


com.datastax.cassandra:cassandra-driver-mapping
3.5.0
3.5.1
None
2018-07-25


commons-cli:commons-cli
1.2
1.4
None
2018-07-25


commons-codec:commons-codec
1.9
1.11
None
2018-07-25


org.apache.commons:commons-dbcp2
2.1.1
2.5.0
None
2018-07-25


com.typesafe:config
1.3.0
1.3.3
None
2018-07-25


de.flapdoodle.embed:de.flapdoodle.embed.mongo
1.50.1
2.1.1
None
2018-07-25


de.flapdoodle.embed:de.flapdoodle.embed.process
1.50.1
2.0.5
None
2018-07-25


org.apache.derby:derby
10.12.1.1
10.14.2.0
None
2018-07-25


org.apache.derby:derbyclient
10.12.1.1
10.14.2.0
None
2018-07-25


org.apache.derby:derbynet
10.12.1.1
10.14.2.0
None
2018-07-25


org.elasticsearch:elasticsearch
5.6.3
6.3.2
None
2018-07-25


org.elasticsearch:elasticsearch-hadoop
5.0.0
6.3.2
None
2018-07-25


org.elasticsearch.client:elasticsearch-rest-client
5.6.3
6.3.2
None
2018-07-25


com.google.errorprone:error_prone_annotations
2.1.2
2.3.1
None
2018-07-25


com.alibaba:fastjson
1.2.12
1.2.47
None
2018-07-25


org.elasticsearch.test:framework
5.6.3
6.3.2
None
2018-07-25


org.freemarker:freemarker
2.3.25-incubating
2.3.28
None
2018-07-25


net.ltgt.gradle:gradle-apt-plugin
0.13
0.18
None
2018-07-25


com.commercehub.gradle.plugin:gradle-avro-plugin
0.11.0
0.14.2
None
2018-07-25


gradle.plugin.com.palantir.gradle.docker:gradle-docker
0.13.0
0.20.1
None
2018-07-25


net.ltgt.gradle:gradle-errorprone-plugin
0.0.13
0.0.16
None
2018-07-25


gradle.plugin.io.pry.gradle.offline_dependencies:gradle-offline-dependencies-plugin
0.3
0.4
None
2018-07-25


net.researchgate:gradle-release
2.6.0
2.7.0
None
2018-07-25


com.github.ben-manes:gradle-versions-plugin
0.17.0
0.20.0
None
2018-07-25


org.codehaus.groovy:groovy-all
2.4.13
3.0.0-alpha-3
None
2018-07-25



Re: BiqQueryIO.write and Wait.on

2018-07-25 Thread Carlos Alonso
Just opened this PR: https://github.com/apache/beam/pull/6055 to get
feedback ASAP. Basically what it does is return the job status in a
PCollection of BigQueryWriteResult objects

On Fri, Jul 20, 2018 at 11:57 PM Reuven Lax  wrote:

> There already is a org.apache.beam.sdk.io.gcp.bigquery.WriteResult class.
>
> On Tue, Jul 17, 2018 at 9:44 AM Eugene Kirpichov 
> wrote:
>
>> Hmm, I think this approach has some complications:
>> - Using JobStatus makes it tied to using BigQuery batch load jobs, but
>> the return type ought to be the same regardless of which method of writing
>> is used (including potential future BigQuery APIs - they are evolving), or
>> how many BigQuery load jobs are involved in writing a given window (it can
>> be multiple).
>> - Returning a success/failure indicator makes it prone to users ignoring
>> the failure: the default behavior should be that, if the pipeline succeeds,
>> that means all data was successfully written - if users want different
>> error handling, e.g. a deadletter queue, they should have to specify it
>> explicitly.
>>
>> I would recommend to return a PCollection of a type that's invariant to
>> which load method is used (streaming writes, load jobs, multiple load jobs
>> etc.). If it's unclear what type that should be, you could introduce an
>> empty type e.g. "class BigQueryWriteResult {}" just for the sake of
>> signaling success, and later add something to it.
>>
>> On Tue, Jul 17, 2018 at 12:30 AM Carlos Alonso 
>> wrote:
>>
>>> All good so far. I've been a bit side tracked but more or less I have
>>> the idea of using the JobStatus as part of the collection so that not only
>>> the completion is signaled, but also the result (success/failure) can be
>>> accessed, how does it sound?
>>>
>>> Regards
>>>
>>> On Tue, Jul 17, 2018 at 3:07 AM Eugene Kirpichov 
>>> wrote:
>>>
 Hi Carlos,

 Any updates / roadblocks you hit?


 On Tue, Jul 3, 2018 at 7:13 AM Eugene Kirpichov 
 wrote:

> Awesome!! Thanks for the heads up, very exciting, this is going to
> make a lot of people happy :)
>
> On Tue, Jul 3, 2018, 3:40 AM Carlos Alonso 
> wrote:
>
>> + dev@beam.apache.org
>>
>> Just a quick email to let you know that I'm starting developing this.
>>
>> On Fri, Apr 20, 2018 at 10:30 PM Eugene Kirpichov <
>> kirpic...@google.com> wrote:
>>
>>> Hi Carlos,
>>>
>>> Thank you for expressing interest in taking this on! Let me give you
>>> a few pointers to start, and I'll be happy to help everywhere along the 
>>> way.
>>>
>>> Basically we want BigQueryIO.write() to return something (e.g. a
>>> PCollection) that can be used as input to Wait.on().
>>> Currently it returns a WriteResult, which only contains a
>>> PCollection of failed inserts - that one can not be used
>>> directly, instead we should add another component to WriteResult that
>>> represents the result of successfully writing some data.
>>>
>>> Given that BQIO supports dynamic destination writes, I think it
>>> makes sense for that to be a PCollection> so that 
>>> in
>>> theory we could sequence different destinations independently (currently
>>> Wait.on() does not provide such a feature, but it could); and it will
>>> require changing WriteResult to be WriteResult. As for 
>>> what
>>> the "???" might be - it is something that represents the result of
>>> successfully writing a window of data. I think it can even be Void, or 
>>> "?"
>>> (wildcard type) for now, until we figure out something better.
>>>
>>> Implementing this would require roughly the following work:
>>> - Add this PCollection> to WriteResult
>>> - Modify the BatchLoads transform to provide it on both codepaths:
>>> expandTriggered() and expandUntriggered()
>>> ...- expandTriggered() itself writes via 2 codepaths:
>>> single-partition and multi-partition. Both need to be handled - we need 
>>> to
>>> get a PCollection> from each of them, and Flatten 
>>> these
>>> two PCollections together to get the final result. The single-partition
>>> codepath (writeSinglePartition) under the hood already uses WriteTables
>>> that returns a KV so it's directly usable. The
>>> multi-partition codepath ends in WriteRenameTriggered - unfortunately, 
>>> this
>>> codepath drops DestinationT along the way and will need to be 
>>> refactored a
>>> bit to keep it until the end.
>>> ...- expandUntriggered() should be treated the same way.
>>> - Modify the StreamingWriteTables transform to provide it
>>> ...- Here also, the challenge is to propagate the DestinationT type
>>> all the way until the end of StreamingWriteTables - it will need to be
>>> refactored. After such a refactoring, returning a KV
>>> should be easy.
>>>
>>> Another challenge with all of this is backwards compatibility in

Jenkins build is back to normal : beam_Release_Gradle_NightlySnapshot #114

2018-07-25 Thread Apache Jenkins Server
See