Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-06-04 Thread Jean-Baptiste Onofré
New failure on the build:

FAILURE: Build failed with an exception.

* What went wrong:
Could not resolve all files for configuration
':beam-sdks-java-io-hadoop-file-system:testCompileClasspath'.
> Could not find zookeeper-tests.jar (org.apache.zookeeper:zookeeper:3.4.6).
  Searched in the following locations:

file:/home/jbonofre/.m2/repository/org/apache/zookeeper/zookeeper/3.4.6/zookeeper-3.4.6-tests.jar

I'm fixing the HDFS extension.

Regards
JB

On 05/06/2018 07:18, Jean-Baptiste Onofré wrote:
> Hi,
> 
> yes, it's release blocker: the build is not fully stable. I'm trying to
> build the release for one week and it fails with different errors.
> 
> I have a new build in progress. I hope it will be good. I keep you posted.
> 
> Regards
> JB
> 
> On 05/06/2018 01:38, Scott Wegner wrote:
>> Hey JB, you mentioned some build issues on Slack [1]. Is this blocking
>> the release? Let me know if there's anything I can help with.
>>
>> [1] https://the-asf.slack.com/archives/C9H0YNP3P/p1528133545000136 
>>
>> On Sun, Jun 3, 2018 at 10:58 PM Jean-Baptiste Onofré > > wrote:
>>
>> Hi guys,
>>
>> just to let you know that the build is now OK. I'm completing the Jira
>> triage this morning (my time) and cut the release branch (starting the
>> release process). I will validate the release guide in the mean time.
>>
>> Thanks,
>> Regards
>> JB
>>
>> On 06/04/2018 10:48, Jean-Baptiste Onofré wrote:
>> > Hi guys,
>> >
>> > Apache Beam 2.4.0 has been released on March 20th.
>> >
>> > According to our cycle of release (roughly 6 weeks), we should
>> think about 2.5.0.
>> >
>> > I'm volunteer to tackle this release.
>> >
>> > I'm proposing the following items:
>> >
>> > 1. We start the Jira triage now, up to Tuesday
>> > 2. I would like to cut the release on Tuesday night (Europe time)
>> > 2bis. I think it's wiser to still use Maven for this release. Do
>> you think we
>> > will be ready to try a release with Gradle ?
>> >
>> > After this release, I would like a discussion about:
>> > 1. Gradle release (if we release 2.5.0 with Maven)
>> > 2. Isolate release cycle per Beam part. I think it would be
>> interesting to have
>> > different release cycle: SDKs, DSLs, Runners, IOs. That's another
>> discussion, I
>> > will start a thread about that.
>> >
>> > Thoughts ?
>> >
>> > Regards
>> > JB
>> >
>>
>> -- 
>> Jean-Baptiste Onofré
>> jbono...@apache.org 
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-06-04 Thread Jean-Baptiste Onofré
Hi,

yes, it's release blocker: the build is not fully stable. I'm trying to
build the release for one week and it fails with different errors.

I have a new build in progress. I hope it will be good. I keep you posted.

Regards
JB

On 05/06/2018 01:38, Scott Wegner wrote:
> Hey JB, you mentioned some build issues on Slack [1]. Is this blocking
> the release? Let me know if there's anything I can help with.
> 
> [1] https://the-asf.slack.com/archives/C9H0YNP3P/p1528133545000136 
> 
> On Sun, Jun 3, 2018 at 10:58 PM Jean-Baptiste Onofré  > wrote:
> 
> Hi guys,
> 
> just to let you know that the build is now OK. I'm completing the Jira
> triage this morning (my time) and cut the release branch (starting the
> release process). I will validate the release guide in the mean time.
> 
> Thanks,
> Regards
> JB
> 
> On 06/04/2018 10:48, Jean-Baptiste Onofré wrote:
> > Hi guys,
> >
> > Apache Beam 2.4.0 has been released on March 20th.
> >
> > According to our cycle of release (roughly 6 weeks), we should
> think about 2.5.0.
> >
> > I'm volunteer to tackle this release.
> >
> > I'm proposing the following items:
> >
> > 1. We start the Jira triage now, up to Tuesday
> > 2. I would like to cut the release on Tuesday night (Europe time)
> > 2bis. I think it's wiser to still use Maven for this release. Do
> you think we
> > will be ready to try a release with Gradle ?
> >
> > After this release, I would like a discussion about:
> > 1. Gradle release (if we release 2.5.0 with Maven)
> > 2. Isolate release cycle per Beam part. I think it would be
> interesting to have
> > different release cycle: SDKs, DSLs, Runners, IOs. That's another
> discussion, I
> > will start a thread about that.
> >
> > Thoughts ?
> >
> > Regards
> > JB
> >
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Jenkins build is back to normal : beam_SeedJob #1882

2018-06-04 Thread Apache Jenkins Server
See 



Build failed in Jenkins: beam_SeedJob #1881

2018-06-04 Thread Apache Jenkins Server
See 

--
GitHub pull request #5406 of commit fe004783708d765c14953797643582cda8818cec, 
no merge conflicts.
Setting status of fe004783708d765c14953797643582cda8818cec to PENDING with url 
https://builds.apache.org/job/beam_SeedJob/1881/ and message: 'Build started 
sha1 is merged.'
Using context: Jenkins: Seed Job
[EnvInject] - Loading node environment variables.
Building remotely on beam14 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/5406/*:refs/remotes/origin/pr/5406/*
 > git rev-parse refs/remotes/origin/pr/5406/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/5406/merge^{commit} # timeout=10
Checking out Revision 5605f3eede6bebea7332f0121616c6d1f5318d95 
(refs/remotes/origin/pr/5406/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5605f3eede6bebea7332f0121616c6d1f5318d95
Commit message: "Merge fe004783708d765c14953797643582cda8818cec into 
be6185fd5c2b471f371d7591108360974954ec12"
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Dependency_Check.groovy
ERROR: (job_Dependency_Check.groovy, line 51) No signature of method: 
javaposse.jobdsl.dsl.helpers.step.StepContext.$() is applicable for argument 
types: (job_Dependency_Check$_run_closure1$_closure2$_closure4) values: 
[job_Dependency_Check$_run_closure1$_closure2$_closure4@4cacb7f0]
Possible solutions: is(java.lang.Object), ant(), any(), 
ant(groovy.lang.Closure), ant(java.lang.String), dsl(groovy.lang.Closure)



Build failed in Jenkins: beam_SeedJob #1880

2018-06-04 Thread Apache Jenkins Server
See 

--
GitHub pull request #5406 of commit 4a23930d1d1d8e5461adb86528817f78fe8be377, 
no merge conflicts.
Setting status of 4a23930d1d1d8e5461adb86528817f78fe8be377 to PENDING with url 
https://builds.apache.org/job/beam_SeedJob/1880/ and message: 'Build started 
sha1 is merged.'
Using context: Jenkins: Seed Job
[EnvInject] - Loading node environment variables.
Building remotely on beam14 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/5406/*:refs/remotes/origin/pr/5406/*
 > git rev-parse refs/remotes/origin/pr/5406/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/5406/merge^{commit} # timeout=10
Checking out Revision 7180966e5490f2b84625ed189e9852c7a8488c6f 
(refs/remotes/origin/pr/5406/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 7180966e5490f2b84625ed189e9852c7a8488c6f
Commit message: "Merge 4a23930d1d1d8e5461adb86528817f78fe8be377 into 
be6185fd5c2b471f371d7591108360974954ec12"
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Dependency_Check.groovy
ERROR: (job_Dependency_Check.groovy, line 51) No signature of method: 
javaposse.jobdsl.dsl.helpers.step.StepContext.$() is applicable for argument 
types: (job_Dependency_Check$_run_closure1$_closure2$_closure4) values: 
[job_Dependency_Check$_run_closure1$_closure2$_closure4@2ad89fa3]
Possible solutions: is(java.lang.Object), ant(), any(), 
ant(groovy.lang.Closure), ant(java.lang.String), dsl(groovy.lang.Closure)



Re: [PROPOSAL] Preparing 2.5.0 release next week

2018-06-04 Thread Scott Wegner
Hey JB, you mentioned some build issues on Slack [1]. Is this blocking the
release? Let me know if there's anything I can help with.

[1] https://the-asf.slack.com/archives/C9H0YNP3P/p1528133545000136

On Sun, Jun 3, 2018 at 10:58 PM Jean-Baptiste Onofré 
wrote:

> Hi guys,
>
> just to let you know that the build is now OK. I'm completing the Jira
> triage this morning (my time) and cut the release branch (starting the
> release process). I will validate the release guide in the mean time.
>
> Thanks,
> Regards
> JB
>
> On 06/04/2018 10:48, Jean-Baptiste Onofré wrote:
> > Hi guys,
> >
> > Apache Beam 2.4.0 has been released on March 20th.
> >
> > According to our cycle of release (roughly 6 weeks), we should think
> about 2.5.0.
> >
> > I'm volunteer to tackle this release.
> >
> > I'm proposing the following items:
> >
> > 1. We start the Jira triage now, up to Tuesday
> > 2. I would like to cut the release on Tuesday night (Europe time)
> > 2bis. I think it's wiser to still use Maven for this release. Do you
> think we
> > will be ready to try a release with Gradle ?
> >
> > After this release, I would like a discussion about:
> > 1. Gradle release (if we release 2.5.0 with Maven)
> > 2. Isolate release cycle per Beam part. I think it would be interesting
> to have
> > different release cycle: SDKs, DSLs, Runners, IOs. That's another
> discussion, I
> > will start a thread about that.
> >
> > Thoughts ?
> >
> > Regards
> > JB
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Jenkins build is unstable: beam_SeedJob #1879

2018-06-04 Thread Apache Jenkins Server
See 



Build failed in Jenkins: beam_SeedJob #1878

2018-06-04 Thread Apache Jenkins Server
See 

--
GitHub pull request #5406 of commit d2a7ca03b5b1ff67bc0b7eb0f5e758a9f75076c0, 
no merge conflicts.
Setting status of d2a7ca03b5b1ff67bc0b7eb0f5e758a9f75076c0 to PENDING with url 
https://builds.apache.org/job/beam_SeedJob/1878/ and message: 'Build started 
sha1 is merged.'
Using context: Jenkins: Seed Job
[EnvInject] - Loading node environment variables.
Building remotely on beam14 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/5406/*:refs/remotes/origin/pr/5406/*
 > git rev-parse refs/remotes/origin/pr/5406/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/5406/merge^{commit} # timeout=10
Checking out Revision 9975aa773604984f64d342efa9b2abd1aac7b476 
(refs/remotes/origin/pr/5406/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9975aa773604984f64d342efa9b2abd1aac7b476
Commit message: "Merge d2a7ca03b5b1ff67bc0b7eb0f5e758a9f75076c0 into 
be6185fd5c2b471f371d7591108360974954ec12"
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Dependency_Check.groovy
ERROR: startup failed:
General error during conversion: Error grabbing Grapes -- [unresolved 
dependency: com.fasterxml.jackson#jackson-core;2.9.5: not found, download 
failed: org.codehaus.jackson#jackson-core-asl;1.9.13!jackson-core-asl.jar]

java.lang.RuntimeException: Error grabbing Grapes -- [unresolved dependency: 
com.fasterxml.jackson#jackson-core;2.9.5: not found, download failed: 
org.codehaus.jackson#jackson-core-asl;1.9.13!jackson-core-asl.jar]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:83)
at 
org.codehaus.groovy.reflection.CachedConstructor.doConstructorInvoke(CachedConstructor.java:77)
at 
org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrap.callConstructor(ConstructorSite.java:84)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:247)
at groovy.grape.GrapeIvy.getDependencies(GrapeIvy.groovy:424)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSite.invoke(PogoMetaMethodSite.java:169)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:59)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallCurrent(CallSiteArray.java:52)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:64)
at groovy.grape.GrapeIvy.resolve(GrapeIvy.groovy:571)
at groovy.grape.GrapeIvy$resolve$1.callCurrent(Unknown Source)
at groovy.grape.GrapeIvy.resolve(GrapeIvy.groovy:538)
at groovy.grape.GrapeIvy$resolve$0.callCurrent(Unknown Source)
at groovy.grape.GrapeIvy.grab(GrapeIvy.groovy:256)
at groovy.grape.Grape.grab(Grape.java:167)
at 
groovy.grape.GrabAnnotationTransformation.visit(GrabAnnotationTransformation.java:378)
at 
org.codehaus.groovy.transform.ASTTransformationVisitor$3.call(ASTTransformationVisitor.java:321)
at 
org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:943)
at 
org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:605)
at 
org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:581)
at 
org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:558)
at 
groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:298)
at 

Re: Go SDK Example=

2018-06-04 Thread James Wilson
Hi Kenn and Henning,

Thank you both for helping me to get started.  I’ll definitely reach out on dev 
as I work on this.

Best,
James

> On Jun 4, 2018, at 12:03 PM, Henning Rohde  wrote:
> 
> Welcome James!
> 
> Awesome that you're interested in contributing to Apache Beam! If you're 
> specifically interested in the Go SDK, the task you identified is a good one 
> to start with. I assigned it to you. I also added a few similar tasks listed 
> below as alternatives. Feel free to pick the one you prefer and re-assign as 
> appropriate (or I can do it for you). It's best that the JIRAs are assigned 
> before any work is done so avoid accidental duplication.
> 
>   BEAM-4466 Add Go TF-IDF example
>   BEAM-4467 Add Go Autocomplete example
> 
> The main caveat for Go streaming pipelines is that they currently only really 
> work on Dataflow, because the only streaming IO connector is PubSub and the 
> direct runner supports batch only. In the near future, however, the ULR and 
> Flink runner will support portable streaming pipelines, including Go. If it 
> is too impractical to work with the IO used by corresponding Java/Python 
> examples, feel free to deviate by using textio or similar instead. There may 
> also be incomplete feature work in Go that prevents an direct translation.
> 
> Please feel to ask questions in the JIRAs or on the dev list. Happy to help!
> 
> Henning
> 
> 
> 
> On Sun, Jun 3, 2018 at 6:41 PM Kenneth Knowles  > wrote:
> Hi James,
> 
> Welcome!
> 
> Have you subscribed to dev@beam.apache.org ? I am 
> including that list here, since that is the most active list for discussing 
> contributions. I've also included Henning explicitly. He is the best person 
> to answer.
> 
> I found your JIRA account and set up permissions so you can be assigned 
> issues.
> 
> Kenn
> 
> On Sun, Jun 3, 2018 at 12:35 PM James Wilson  > wrote:
> Hi All,
> 
> This is first time I am trying to contribute to a large open source project.  
> I was going to tackle the BEAM-4292 "Add streaming word count example" for 
> the Go SDK.  Do I assign it to myself or just complete the task and create a 
> PR request?  I read through the contributing page on the Apache Beam site, 
> but it didn’t go into how to tackle your first task.  Any help would be 
> appreciated.
> 
> Best,
> James



Re: Portability and Timers

2018-06-04 Thread Lukasz Cwik
Fixed the permissions, feel free to comment on the doc.

The specs on the ParDoPayload will stay, analogous to the SideInputPayload.
The PCollection will not be modified and will continue to contain the
windowing strategy and coder.

On Mon, Jun 4, 2018 at 3:41 PM Kenneth Knowles  wrote:

> I like it. Having the extra portability layer really opens up these
> possibilities that wouldn't make a usable API for a user, but are really
> helpful for modeling.
>
> I've only got View permissions to the doc, so commenting here. You mention
> that they are modeled as a PCollection, but it seems that they will re-use
> the PCollection protos and data plane but there's no sense in which a
> runner can just treat this as a PCollection. It will have to be noticed (by
> coder?) and special-cased. And will you be removing the specs from the
> ParDoPayload?
>
> Kenn
>
> On Mon, Jun 4, 2018 at 3:00 PM Lukasz Cwik  wrote:
>
>> I have been working on a proposal for adding support for timers to the
>> Apache Beam portability APIs.
>>
>> The synopsis is to model timers as PCollections. This allows us to treat
>> timers as just another type of data that is transmitted/received by a
>> Runner during execution and leverage all the work that went into those APIs
>> and implementations.
>>
>> For further details, please take a look this doc[1].
>>
>> 1: https://s.apache.org/beam-portability-timers
>>
>


Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

2018-06-04 Thread Lukasz Cwik
I totally agree, but there are so many Java APIs (including ours) that
messed this up so everyone lives with the same hack.

On Mon, Jun 4, 2018 at 3:41 PM Andrew Pilloud  wrote:

> It seems like a terribly fragile way to pass arguments but my tests pass
> when I wrap the JDBC path into Beam pipeline execution with that pattern.
>
> Thanks!
>
> Andrew
>
> On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik  wrote:
>
>> It is a common mistake for APIs to not include a way to specify which
>> class loader to use when doing something like deserializing an instance of
>> a class via the ObjectInputStream. This common issue also affects Apache
>> Beam (SerializableCoder, PipelineOptionsFactory, ...) and the way that
>> typical Java APIs have gotten around this is to use to the thread context
>> class loader (TCCL) as the way to plumb this additional attribute through.
>> So Apache Beam is meant to in all places honor the TCCL if it has been set
>> as most Java libraries (not all) do the same hack.
>>
>> In most environments the TCCL is not set and we are working with a single
>> class loader. It turns out that in more complicated environments (like when
>> loading a JDBC driver, or JNDI, or an application server, ...) this usually
>> doesn't work without each caller knowing what class loading context they
>> should be in. A common work around for most scenarios is to always set the
>> TCCL to the current classes class loader like so before invoking any APIs
>> that do class loading so you don't propagate the TCCL of the caller along
>> since they may have set it for some other reason:
>>
>> ClassLoader originalClassLoader = 
>> Thread.currentThread().getContextClassLoader();try {
>> 
>> Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
>> // call some API that uses reflection without taking ClassLoader param} 
>> finally {
>> Thread.currentThread().setContextClassLoader(originalClassLoader);}
>>
>>
>>
>> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud 
>> wrote:
>>
>>> I'm having class loading issues that go away when I revert the changes
>>> in our use of Class.forName added in
>>> https://github.com/apache/beam/pull/4674. The problem I'm having is
>>> that the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates an
>>> isolated class loader to load our library. Things work if we call
>>> Class.forName with the default class loader [getClass().getClassLoader() or
>>> no argument] but not if we use the thread context class loader
>>> [Thread.currentThread().getContextClassLoader() or
>>> ReflectHelpers.findClassLoader()]. Why is using the default class loader
>>> not the right thing to do? How can I fix this problem?
>>>
>>> See this integration test for an example:
>>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
>>>
>>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>>>
>>> Andrew
>>>
>>


Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

2018-06-04 Thread Andrew Pilloud
It seems like a terribly fragile way to pass arguments but my tests pass
when I wrap the JDBC path into Beam pipeline execution with that pattern.

Thanks!

Andrew

On Mon, Jun 4, 2018 at 3:20 PM Lukasz Cwik  wrote:

> It is a common mistake for APIs to not include a way to specify which
> class loader to use when doing something like deserializing an instance of
> a class via the ObjectInputStream. This common issue also affects Apache
> Beam (SerializableCoder, PipelineOptionsFactory, ...) and the way that
> typical Java APIs have gotten around this is to use to the thread context
> class loader (TCCL) as the way to plumb this additional attribute through.
> So Apache Beam is meant to in all places honor the TCCL if it has been set
> as most Java libraries (not all) do the same hack.
>
> In most environments the TCCL is not set and we are working with a single
> class loader. It turns out that in more complicated environments (like when
> loading a JDBC driver, or JNDI, or an application server, ...) this usually
> doesn't work without each caller knowing what class loading context they
> should be in. A common work around for most scenarios is to always set the
> TCCL to the current classes class loader like so before invoking any APIs
> that do class loading so you don't propagate the TCCL of the caller along
> since they may have set it for some other reason:
>
> ClassLoader originalClassLoader = 
> Thread.currentThread().getContextClassLoader();try {
> Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
> // call some API that uses reflection without taking ClassLoader param} 
> finally {
> Thread.currentThread().setContextClassLoader(originalClassLoader);}
>
>
>
> On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud  wrote:
>
>> I'm having class loading issues that go away when I revert the changes in
>> our use of Class.forName added in
>> https://github.com/apache/beam/pull/4674. The problem I'm having is that
>> the typical JDBC GUI (SqlWorkbench/J, SQuirreL SQL) creates an isolated
>> class loader to load our library. Things work if we call Class.forName with
>> the default class loader [getClass().getClassLoader() or no argument] but
>> not if we use the thread context class loader
>> [Thread.currentThread().getContextClassLoader() or
>> ReflectHelpers.findClassLoader()]. Why is using the default class loader
>> not the right thing to do? How can I fix this problem?
>>
>> See this integration test for an example:
>> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
>> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>>
>> Andrew
>>
>


Re: Portability and Timers

2018-06-04 Thread Kenneth Knowles
I like it. Having the extra portability layer really opens up these
possibilities that wouldn't make a usable API for a user, but are really
helpful for modeling.

I've only got View permissions to the doc, so commenting here. You mention
that they are modeled as a PCollection, but it seems that they will re-use
the PCollection protos and data plane but there's no sense in which a
runner can just treat this as a PCollection. It will have to be noticed (by
coder?) and special-cased. And will you be removing the specs from the
ParDoPayload?

Kenn

On Mon, Jun 4, 2018 at 3:00 PM Lukasz Cwik  wrote:

> I have been working on a proposal for adding support for timers to the
> Apache Beam portability APIs.
>
> The synopsis is to model timers as PCollections. This allows us to treat
> timers as just another type of data that is transmitted/received by a
> Runner during execution and leverage all the work that went into those APIs
> and implementations.
>
> For further details, please take a look this doc[1].
>
> 1: https://s.apache.org/beam-portability-timers
>


Re: Beam breaks when it isn't loaded via the Thread Context Class Loader

2018-06-04 Thread Lukasz Cwik
It is a common mistake for APIs to not include a way to specify which class
loader to use when doing something like deserializing an instance of a
class via the ObjectInputStream. This common issue also affects Apache Beam
(SerializableCoder, PipelineOptionsFactory, ...) and the way that typical
Java APIs have gotten around this is to use to the thread context class
loader (TCCL) as the way to plumb this additional attribute through. So
Apache Beam is meant to in all places honor the TCCL if it has been set as
most Java libraries (not all) do the same hack.

In most environments the TCCL is not set and we are working with a single
class loader. It turns out that in more complicated environments (like when
loading a JDBC driver, or JNDI, or an application server, ...) this usually
doesn't work without each caller knowing what class loading context they
should be in. A common work around for most scenarios is to always set the
TCCL to the current classes class loader like so before invoking any APIs
that do class loading so you don't propagate the TCCL of the caller along
since they may have set it for some other reason:

ClassLoader originalClassLoader =
Thread.currentThread().getContextClassLoader();try {
Thread.currentThread().setContextClassLoader(getClass().getClassLoader());
// call some API that uses reflection without taking ClassLoader
param} finally {
Thread.currentThread().setContextClassLoader(originalClassLoader);}



On Mon, Jun 4, 2018 at 1:57 PM Andrew Pilloud  wrote:

> I'm having class loading issues that go away when I revert the changes in
> our use of Class.forName added in https://github.com/apache/beam/pull/4674.
> The problem I'm having is that the typical JDBC GUI
> (SqlWorkbench/J, SQuirreL SQL) creates an isolated class loader to load our
> library. Things work if we call Class.forName with the default class loader
> [getClass().getClassLoader() or no argument] but not if we use the thread
> context class loader [Thread.currentThread().getContextClassLoader() or
> ReflectHelpers.findClassLoader()]. Why is using the default class loader
> not the right thing to do? How can I fix this problem?
>
> See this integration test for an example:
> https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
> https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k
>
> Andrew
>


Re: [VOTE] Code Review Response-time SLO

2018-06-04 Thread Huygaa Batsaikhan
Proposal 1: +1
Proposal 2: +1
Additional Comments: This is an example vote

On Mon, Jun 4, 2018 at 3:15 PM Huygaa Batsaikhan  wrote:

> A few months ago, Reuven sent out an email
> 
> about improvements to Beam's code review process. Because the email covered
> multiple issues, we did not really dig deep into each of them. One of the
> suggestions was to agree on a code review response turnaround time (SLO
> ). Here is the
> direct quote:
>
> It would be great if we could agree on a response-time SLA for Beam code
> reviews. The response might be “I am unable to do the review until next
> week,” however even that is better than getting no response.
>
>
> All the comments on the original thread supported having an agreed upon
> SLO. Therefore, I would like to discuss possible response-time SLO and
> finalize it within this thread. For the purpose of this discussion, let's
> put aside related topics such as the need of tooling support like PR
> dashboard or reviewer availability for future discussions.
>
> *My proposals*
>
> *Proposal 1*
> I propose having a *Default* review response time as *3 business days*.
> This aligns with the frequency we consider most developers are checking the
> dev list. My reasoning is, if one is checking the dev list, they could also
> check their PR review queue.
>
> *Proposal 2*
> I propose having an *Opt-in* review response time as *24 hours*.
> Contributors are happy when reviewers respond swiftly to their PRs.
> Specially, when we are making multiple small changes to Beam, waiting for
> even a few days is frustrating. I understand that not all the reviewers can
> review PRs daily. However, if some of us can incorporate half an hour of
> beam review to our schedule, it could improve contributors' experience
> drastically. Therefore, I suggest us having opt-in response time of 24
> hours. We can discuss how we can communicate this SLO to contributors and
> reviewers in a separate thread.
>
> Please vote on these 2 proposals and propose any other solutions using
> within this template:
>
> Template:
> Proposal 1: <+-1> 
> Proposal 2: <+-1> 
> Additional Comments: 
>
> Example answer:
> Proposal 1: +1 Great idea
> Proposal 2: +1
> Additional Comments: I have this idea foobar 
>
> Thank you,
> Huygaa
>
>


[VOTE] Code Review Response-time SLO

2018-06-04 Thread Huygaa Batsaikhan
A few months ago, Reuven sent out an email

about improvements to Beam's code review process. Because the email covered
multiple issues, we did not really dig deep into each of them. One of the
suggestions was to agree on a code review response turnaround time (SLO
). Here is the
direct quote:

It would be great if we could agree on a response-time SLA for Beam code
reviews. The response might be “I am unable to do the review until next
week,” however even that is better than getting no response.


All the comments on the original thread supported having an agreed upon
SLO. Therefore, I would like to discuss possible response-time SLO and
finalize it within this thread. For the purpose of this discussion, let's
put aside related topics such as the need of tooling support like PR
dashboard or reviewer availability for future discussions.

*My proposals*

*Proposal 1*
I propose having a *Default* review response time as *3 business days*.
This aligns with the frequency we consider most developers are checking the
dev list. My reasoning is, if one is checking the dev list, they could also
check their PR review queue.

*Proposal 2*
I propose having an *Opt-in* review response time as *24 hours*.
Contributors are happy when reviewers respond swiftly to their PRs.
Specially, when we are making multiple small changes to Beam, waiting for
even a few days is frustrating. I understand that not all the reviewers can
review PRs daily. However, if some of us can incorporate half an hour of
beam review to our schedule, it could improve contributors' experience
drastically. Therefore, I suggest us having opt-in response time of 24
hours. We can discuss how we can communicate this SLO to contributors and
reviewers in a separate thread.

Please vote on these 2 proposals and propose any other solutions using
within this template:

Template:
Proposal 1: <+-1> 
Proposal 2: <+-1> 
Additional Comments: 

Example answer:
Proposal 1: +1 Great idea
Proposal 2: +1
Additional Comments: I have this idea foobar 

Thank you,
Huygaa


Re: [SQL] Unsupported features

2018-06-04 Thread Kai Jiang
Ismaël, I was running this naive code snippet
.
Yes, IT would be interesting. Next step, I was thinking of is making the
progress automatically and integrating with Nexmark.
Do you have any ideas about this? Currently, I ingested data by reading
plain CSV file. Is that possible to run batch job with non-generated data
in Nexmark?

Best,
Kai
ᐧ

On Mon, Jun 4, 2018 at 4:41 AM Ismaël Mejía  wrote:

> This is super interesting, great work Kai!
>
> Just for curiosity, How are you validating this?
> It would be really interesting to have this also as part of some kind of
> IT for the future.
>
>
> On Fri, Jun 1, 2018 at 7:43 PM Kai Jiang  wrote:
>
>> Sounds a good idea! I will file the major problems later and use a task
>> issue to track.
>>
>> Best,
>> Kai
>> ᐧ
>>
>> On Fri, Jun 1, 2018 at 10:10 AM Anton Kedin  wrote:
>>
>>> This looks very helpful, thank you.
>>>
>>> Can you file Jiras for the major problems? Or maybe a single jira for
>>> the whole thing with sub-tasks for specific problems.
>>>
>>> Regards,
>>> Anton
>>>
>>> On Wed, May 30, 2018 at 9:12 AM Kenneth Knowles  wrote:
>>>
 This is extremely useful. Thanks for putting so much information
 together!

 Kenn

 On Wed, May 30, 2018 at 8:19 AM Kai Jiang  wrote:

> Hi all,
>
> Based on pull/5481 , I
> manually did a coverage test with TPC-ds queries (65%) and TPC-h queries
> (100%) and want to see what features Beam SQL is currently not supporting.
> Test was running on DirectRunner.
>
> I want to share the result.​
>  TPC-DS queries on Beam
> 
> ​
> TL;DR:
>
>1. aggregation function (stddev) missing or calculation of
>aggregation functions combination.
>2. nested beamjoinrel(condition=[true], joinType=[inner]) / cross
>join error
>3. date type casting/ calculation and other types casting.
>4. LIKE operator in String / alias for substring function
>5. order by w/o limit clause.
>6. OR operator is supported in join condition
>7. Syntax: exist/ not exist (errors) .rank() over (partition
>by) / view (unsupported)
>
>
> Best,
> Kai
> ᐧ
>



Portability and Timers

2018-06-04 Thread Lukasz Cwik
I have been working on a proposal for adding support for timers to the
Apache Beam portability APIs.

The synopsis is to model timers as PCollections. This allows us to treat
timers as just another type of data that is transmitted/received by a
Runner during execution and leverage all the work that went into those APIs
and implementations.

For further details, please take a look this doc[1].

1: https://s.apache.org/beam-portability-timers


Beam breaks when it isn't loaded via the Thread Context Class Loader

2018-06-04 Thread Andrew Pilloud
I'm having class loading issues that go away when I revert the changes in
our use of Class.forName added in https://github.com/apache/beam/pull/4674.
The problem I'm having is that the typical JDBC GUI
(SqlWorkbench/J, SQuirreL SQL) creates an isolated class loader to load our
library. Things work if we call Class.forName with the default class loader
[getClass().getClassLoader() or no argument] but not if we use the thread
context class loader [Thread.currentThread().getContextClassLoader() or
ReflectHelpers.findClassLoader()]. Why is using the default class loader
not the right thing to do? How can I fix this problem?

See this integration test for an example:
https://github.com/apilloud/beam/blob/directrunner/sdks/java/extensions/sql/jdbc/src/test/java/org/apache/beam/sdk/extensions/sql/jdbc/JdbcIT.java#L44
https://scans.gradle.com/s/iquqinhns2ymi/tests/slmg6ytuuqlus-akh5xpgshj32k

Andrew


Re: [VOTE] Code Review Process

2018-06-04 Thread Griselda Cuevas
+1

On Mon, 4 Jun 2018 at 12:30, Robert Burke  wrote:

> +1
>
> On Mon, Jun 4, 2018, 9:01 AM Raghu Angadi  wrote:
>
>> +1
>>
>> On Fri, Jun 1, 2018 at 10:25 AM Thomas Groh  wrote:
>>
>>> As we seem to largely have consensus in "Reducing Committer Load for
>>> Code Reviews"[1], this is a vote to change the Beam policy on Code Reviews
>>> to require that
>>>
>>> (1) At least one committer is involved with the code review, as either a
>>> reviewer or as the author
>>> (2) A contributor has approved the change
>>>
>>> prior to merging any change.
>>>
>>> This changes our policy from its current requirement that at least one
>>> committer *who is not the author* has approved the change prior to merging.
>>> We believe that changing this process will improve code review throughput,
>>> reduce committer load, and engage more of the community in the code review
>>> process.
>>>
>>> Please vote:
>>> [ ] +1: Accept the above proposal to change the Beam code review/merge
>>> policy
>>> [ ] -1: Leave the Code Review policy unchanged
>>>
>>> Thanks,
>>>
>>> Thomas
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/7c1fde3884fbefacc252b6d4b434f9a9c2cf024f381654aa3e47df18@%3Cdev.beam.apache.org%3E
>>>
>>


Re: [VOTE] Code Review Process

2018-06-04 Thread Robert Burke
+1

On Mon, Jun 4, 2018, 9:01 AM Raghu Angadi  wrote:

> +1
>
> On Fri, Jun 1, 2018 at 10:25 AM Thomas Groh  wrote:
>
>> As we seem to largely have consensus in "Reducing Committer Load for Code
>> Reviews"[1], this is a vote to change the Beam policy on Code Reviews to
>> require that
>>
>> (1) At least one committer is involved with the code review, as either a
>> reviewer or as the author
>> (2) A contributor has approved the change
>>
>> prior to merging any change.
>>
>> This changes our policy from its current requirement that at least one
>> committer *who is not the author* has approved the change prior to merging.
>> We believe that changing this process will improve code review throughput,
>> reduce committer load, and engage more of the community in the code review
>> process.
>>
>> Please vote:
>> [ ] +1: Accept the above proposal to change the Beam code review/merge
>> policy
>> [ ] -1: Leave the Code Review policy unchanged
>>
>> Thanks,
>>
>> Thomas
>>
>> [1]
>> https://lists.apache.org/thread.html/7c1fde3884fbefacc252b6d4b434f9a9c2cf024f381654aa3e47df18@%3Cdev.beam.apache.org%3E
>>
>


Re: Beam SQL Improvements

2018-06-04 Thread Romain Manni-Bucau
This can create other issues with IO if the runner is not designed for it
(like direct runner) so probably not something reliable for beam generic
part :(.

Le lun. 4 juin 2018 20:10, Lukasz Cwik  a écrit :

> Shouldn't the runner isolate each instance of the pipeline behind an
> appropriate class loader?
>
> On Sun, Jun 3, 2018 at 12:45 PM Reuven Lax  wrote:
>
>> Just an update: Romain and I chatted on Slack, and I think I understand
>> his concern. The concern wasn't specifically about schemas, rather about
>> having a generic way to register per-ParDo state that has worker lifetime.
>> As evidence that such is needed, in many cases static variables are used to
>> simiulate that. static variables however have downsides - if two pipelines
>> are run on the same JVM (happens often with unit tests, and there's nothing
>> that prevents a runner from doing so in a production environment), these
>> static variables will interfere with each other.
>>
>> On Thu, May 24, 2018 at 12:30 AM Reuven Lax  wrote:
>>
>>> Romain, maybe it would be useful for us to find some time on slack. I'd
>>> like to understand your concerns. Also keep in mind that I'm tagging all
>>> these classes as Experimental for now, so we can definitely change these
>>> interfaces around if we decide they are not the best ones.
>>>
>>> Reuven
>>>
>>> On Tue, May 22, 2018 at 11:35 PM Romain Manni-Bucau <
>>> rmannibu...@gmail.com> wrote:
>>>
 Why not extending ProcessContext to add the new remapped output? But
 looks good (the part i dont like is that creating a new context each time a
 new feature is added is hurting users. What when beam will add some
 reactive support? ReactiveOutputReceiver?)

 Pipeline sounds the wrong storage since once distributed you serialized
 the instances so kind of broke the lifecycle of the original instance and
 have no real release/close hook on them anymore right? Not sure we can do
 better than dofn/source embedded instances today.




 Le mer. 23 mai 2018 08:02, Romain Manni-Bucau 
 a écrit :

>
>
> Le mer. 23 mai 2018 07:55, Jean-Baptiste Onofré  a
> écrit :
>
>> Hi,
>>
>> IMHO, it would be better to have a explicit transform/IO as converter.
>>
>> It would be easier for users.
>>
>> Another option would be to use a "TypeConverter/SchemaConverter" map
>> as
>> we do in Camel: Beam could check the source/destination "type" and
>> check
>> in the map if there's a converter available. This map can be store as
>> part of the pipeline (as we do for filesystem registration).
>>
>
>
> It works in camel because it is not strongly typed, isnt it? So can
> require a beam new pipeline api.
>
> +1 for the explicit transform, if added to the pipeline api as coder
> it wouldnt break the fluent api:
>
> p.apply(io).setOutputType(Foo.class)
>
> Coders can be a workaround since they owns the type but since the
> pcollection is the real owner it is surely saner this way, no?
>
> Also it needs to ensure all converters are present before running the
> pipeline probably, no implicit environment converter support is probably
> good to start to avoid late surprises.
>
>
>
>> My $0.01
>>
>> Regards
>> JB
>>
>> On 23/05/2018 07:51, Romain Manni-Bucau wrote:
>> > How does it work on the pipeline side?
>> > Do you generate these "virtual" IO at build time to enable the
>> fluent
>> > API to work not erasing generics?
>> >
>> > ex: SQL(row)->BigQuery(native) will not compile so we need a
>> > SQL(row)->BigQuery(row)
>> >
>> > Side note unrelated to Row: if you add another registry maybe a
>> pretask
>> > is to ensure beam has a kind of singleton/context to avoid to
>> duplicate
>> > it or not track it properly. These kind of converters will need a
>> global
>> > close and not only per record in general:
>> > converter.init();converter.convert(row);converter.destroy();,
>> > otherwise it easily leaks. This is why it can require some way to
>> not
>> > recreate it. A quick fix, if you are in bytebuddy already, can be
>> to add
>> > it to setup/teardown pby, being more global would be nicer but is
>> more
>> > challenging.
>> >
>> > Romain Manni-Bucau
>> > @rmannibucau  |  Blog
>> >  | Old Blog
>> >  | Github
>> >  | LinkedIn
>> >  | Book
>> > <
>> https://www.packtpub.com/application-development/java-ee-8-high-performance
>> >
>> >
>> >
>> > Le mer. 23 mai 2018 à 07:22, Reuven Lax > > > a écrit :
>> >
>> > No - the only modules we need to add to 

Re: Multimap PCollectionViews' values udpated rather than appended

2018-06-04 Thread Lukasz Cwik
Carlos, can you provide a test/code snippet for the bug that shows the
issue?

On Mon, Jun 4, 2018 at 11:57 AM Lukasz Cwik  wrote:

> +dev@beam.apache.org
> Note that this is likely a bug in the DirectRunner for accumulation mode,
> filed: https://issues.apache.org/jira/browse/BEAM-4470
>
> Discarding mode is meant to always be the latest firing, the issue though
> is that you need to emit the entire map every time. If you can do this,
> then it makes sense to use discarding mode. The issue with discarding mode
> is that if your first trigger firing produces (A, 1), (B, 1) and your
> second firing produces (B, 2), the multimap will only contain (B, 2) and
> (A, 1) will have been discarded.
>
> To my knowledge, there is no guarantee about the order in which the values
> are combined. You will need to use some piece of information about the
> element to figure out which is the latest (or encode some additional
> information along with each element to make this easy).
>
> On Thu, May 31, 2018 at 9:16 AM Carlos Alonso 
> wrote:
>
>> I've improved the example a little and added some tests
>> https://github.com/calonso/beam_experiments/blob/master/refreshingsideinput/src/test/scala/com/mrcalonso/RefreshingSideInput2Test.scala
>>
>> The behaviour is slightly different, which is possibly because of the
>> different runners (Dataflow/Direct) implementations, but still not working.
>>
>> Now what happens is that although the internal PCollection gets updated,
>> the view isn't. This is happening regardless of the accumulation mode.
>>
>> Regarding the accumulation mode on Dataflow... That was it!! Now the sets
>> contain all the items, however, one more question, is the ordering within
>> the set deterministic? (i.e: Can I assume that the latest will always be on
>> the last position of the Iterable object?)
>>
>> Also... given that for my particular case I only want the latest version,
>> would you advice me to go ahead with Discarding mode?
>>
>> Regards
>>
>> On Thu, May 31, 2018 at 4:44 PM Lukasz Cwik  wrote:
>>
>>> The trigger definition in the sample code you have is using discarding
>>> firing mode. Try swapping to using accumulating mode.
>>>
>>>
>>> On Thu, May 31, 2018 at 1:42 AM Carlos Alonso 
>>> wrote:
>>>
 But I think what I'm experiencing is quite different. Basically the
 side input is updated, but only one element is found on the Iterable that
 is the value of any key of the multimap.

 I mean, no concatenation seems to be happening. On the linked thread,
 Kenn suggests that every firing will add the new value to the set of values
 for the emitted key, but what I'm experiencing is that the new value is
 there, but just itself (i.e: is the only element in the set).

 @Robert, I'm using
 Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane())

 On Wed, May 30, 2018 at 7:46 PM Lukasz Cwik  wrote:

> An alternative to the thread that Kenn linked (adding support for
> retractions) is to add explicit support for combiners into side inputs. 
> The
> system currently works by using a hardcoded concatenating combiner, so
> maps, lists, iterables, singletons, multimaps all work by concatenating 
> the
> set of values emitted and then turning it into a view which is why it is 
> an
> error for a singleton and map view if the trigger fires multiple times.
>
> On Wed, May 30, 2018 at 10:01 AM Kenneth Knowles 
> wrote:
>
>> Yes, this is a known issue. Here's a prior discussion:
>> https://lists.apache.org/thread.html/e9518f5d5f4bcf7bab02de2cb9fe1bd5293d87aa12d46de1eac4600b@%3Cuser.beam.apache.org%3E
>>
>> It is actually long-standing and the solution is known but hard.
>>
>>
>>
>> On Wed, May 30, 2018 at 9:48 AM Carlos Alonso 
>> wrote:
>>
>>> Hi everyone!!
>>>
>>> Working with multimap based side inputs on the global window I'm
>>> experiencing something unexpected (at least to me) that I'd like to 
>>> share
>>> with you to clarify.
>>>
>>> The way I understand multimaps is that when one emits two values for
>>> the same key for the same window (obvious thing here as I'm working on 
>>> the
>>> Global one), the newly emitted values are appended to the Iterable
>>> collection that is the value for that particular key on the map.
>>>
>>> Testing it in this job (it is using scio, but side inputs are
>>> implemented with PCollectionViews):
>>> https://github.com/calonso/beam_experiments/blob/master/refreshingsideinput/src/main/scala/com/mrcalonso/RefreshingSideInput2.scala
>>>
>>> The steps to reproduce are:
>>> 1. Create one table on the target BQ
>>> 2. Run the job
>>> 3. Patch the table on BQ (add one field), this should generate a new
>>> TableSchema for the corresponding TableReference
>>> 4. An updated value of the fields number appear on the logs, but
>>> 

Jenkins build is back to normal : beam_SeedJob #1870

2018-06-04 Thread Apache Jenkins Server
See 



Re: Multimap PCollectionViews' values udpated rather than appended

2018-06-04 Thread Lukasz Cwik
+dev@beam.apache.org
Note that this is likely a bug in the DirectRunner for accumulation mode,
filed: https://issues.apache.org/jira/browse/BEAM-4470

Discarding mode is meant to always be the latest firing, the issue though
is that you need to emit the entire map every time. If you can do this,
then it makes sense to use discarding mode. The issue with discarding mode
is that if your first trigger firing produces (A, 1), (B, 1) and your
second firing produces (B, 2), the multimap will only contain (B, 2) and
(A, 1) will have been discarded.

To my knowledge, there is no guarantee about the order in which the values
are combined. You will need to use some piece of information about the
element to figure out which is the latest (or encode some additional
information along with each element to make this easy).

On Thu, May 31, 2018 at 9:16 AM Carlos Alonso  wrote:

> I've improved the example a little and added some tests
> https://github.com/calonso/beam_experiments/blob/master/refreshingsideinput/src/test/scala/com/mrcalonso/RefreshingSideInput2Test.scala
>
> The behaviour is slightly different, which is possibly because of the
> different runners (Dataflow/Direct) implementations, but still not working.
>
> Now what happens is that although the internal PCollection gets updated,
> the view isn't. This is happening regardless of the accumulation mode.
>
> Regarding the accumulation mode on Dataflow... That was it!! Now the sets
> contain all the items, however, one more question, is the ordering within
> the set deterministic? (i.e: Can I assume that the latest will always be on
> the last position of the Iterable object?)
>
> Also... given that for my particular case I only want the latest version,
> would you advice me to go ahead with Discarding mode?
>
> Regards
>
> On Thu, May 31, 2018 at 4:44 PM Lukasz Cwik  wrote:
>
>> The trigger definition in the sample code you have is using discarding
>> firing mode. Try swapping to using accumulating mode.
>>
>>
>> On Thu, May 31, 2018 at 1:42 AM Carlos Alonso 
>> wrote:
>>
>>> But I think what I'm experiencing is quite different. Basically the side
>>> input is updated, but only one element is found on the Iterable that is the
>>> value of any key of the multimap.
>>>
>>> I mean, no concatenation seems to be happening. On the linked thread,
>>> Kenn suggests that every firing will add the new value to the set of values
>>> for the emitted key, but what I'm experiencing is that the new value is
>>> there, but just itself (i.e: is the only element in the set).
>>>
>>> @Robert, I'm using
>>> Repeatedly.forever(AfterProcessingTime.pastFirstElementInPane())
>>>
>>> On Wed, May 30, 2018 at 7:46 PM Lukasz Cwik  wrote:
>>>
 An alternative to the thread that Kenn linked (adding support for
 retractions) is to add explicit support for combiners into side inputs. The
 system currently works by using a hardcoded concatenating combiner, so
 maps, lists, iterables, singletons, multimaps all work by concatenating the
 set of values emitted and then turning it into a view which is why it is an
 error for a singleton and map view if the trigger fires multiple times.

 On Wed, May 30, 2018 at 10:01 AM Kenneth Knowles 
 wrote:

> Yes, this is a known issue. Here's a prior discussion:
> https://lists.apache.org/thread.html/e9518f5d5f4bcf7bab02de2cb9fe1bd5293d87aa12d46de1eac4600b@%3Cuser.beam.apache.org%3E
>
> It is actually long-standing and the solution is known but hard.
>
>
>
> On Wed, May 30, 2018 at 9:48 AM Carlos Alonso 
> wrote:
>
>> Hi everyone!!
>>
>> Working with multimap based side inputs on the global window I'm
>> experiencing something unexpected (at least to me) that I'd like to share
>> with you to clarify.
>>
>> The way I understand multimaps is that when one emits two values for
>> the same key for the same window (obvious thing here as I'm working on 
>> the
>> Global one), the newly emitted values are appended to the Iterable
>> collection that is the value for that particular key on the map.
>>
>> Testing it in this job (it is using scio, but side inputs are
>> implemented with PCollectionViews):
>> https://github.com/calonso/beam_experiments/blob/master/refreshingsideinput/src/main/scala/com/mrcalonso/RefreshingSideInput2.scala
>>
>> The steps to reproduce are:
>> 1. Create one table on the target BQ
>> 2. Run the job
>> 3. Patch the table on BQ (add one field), this should generate a new
>> TableSchema for the corresponding TableReference
>> 4. An updated value of the fields number appear on the logs, but
>> there is only one element within the iterable, as if it had been updated
>> instead of appended!!
>>
>> Is that the expected behaviour? Is a bug? Am I missing something?
>>
>> Thanks!
>>
>


Build failed in Jenkins: beam_SeedJob #1869

2018-06-04 Thread Apache Jenkins Server
See 

--
GitHub pull request #5406 of commit be1be8d255d3c4b7eff09df0cbf7135b034b4ce6, 
no merge conflicts.
Setting status of be1be8d255d3c4b7eff09df0cbf7135b034b4ce6 to PENDING with url 
https://builds.apache.org/job/beam_SeedJob/1869/ and message: 'Build started 
sha1 is merged.'
Using context: Jenkins: Seed Job
[EnvInject] - Loading node environment variables.
Building remotely on beam14 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/5406/*:refs/remotes/origin/pr/5406/*
 > git rev-parse refs/remotes/origin/pr/5406/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/5406/merge^{commit} # timeout=10
Checking out Revision 9ad7a016ed489326d7e4253021a910593b9c0678 
(refs/remotes/origin/pr/5406/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9ad7a016ed489326d7e4253021a910593b9c0678
Commit message: "Merge be1be8d255d3c4b7eff09df0cbf7135b034b4ce6 into 
697a1d17e473cd5b097aaaeee24c08f43cc77f58"
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Dependency_Check.groovy
ERROR: startup failed:
General error during conversion: Error grabbing Grapes -- [download failed: 
com.google.guava#guava-jdk5;17.0!guava-jdk5.jar(bundle)]

java.lang.RuntimeException: Error grabbing Grapes -- [download failed: 
com.google.guava#guava-jdk5;17.0!guava-jdk5.jar(bundle)]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:83)
at 
org.codehaus.groovy.reflection.CachedConstructor.doConstructorInvoke(CachedConstructor.java:77)
at 
org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrap.callConstructor(ConstructorSite.java:84)
at 
org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCallConstructor(CallSiteArray.java:60)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:235)
at 
org.codehaus.groovy.runtime.callsite.AbstractCallSite.callConstructor(AbstractCallSite.java:247)
at groovy.grape.GrapeIvy.getDependencies(GrapeIvy.groovy:424)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSite.invoke(PogoMetaMethodSite.java:169)
at 
org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite.callCurrent(PogoMetaMethodSite.java:59)
at groovy.grape.GrapeIvy.resolve(GrapeIvy.groovy:571)
at groovy.grape.GrapeIvy$resolve$1.callCurrent(Unknown Source)
at groovy.grape.GrapeIvy.resolve(GrapeIvy.groovy:538)
at groovy.grape.GrapeIvy$resolve$0.callCurrent(Unknown Source)
at groovy.grape.GrapeIvy.grab(GrapeIvy.groovy:256)
at groovy.grape.Grape.grab(Grape.java:167)
at 
groovy.grape.GrabAnnotationTransformation.visit(GrabAnnotationTransformation.java:378)
at 
org.codehaus.groovy.transform.ASTTransformationVisitor$3.call(ASTTransformationVisitor.java:321)
at 
org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:943)
at 
org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:605)
at 
org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:581)
at 
org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:558)
at 
groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:298)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:268)
at groovy.lang.GroovyShell.parseClass(GroovyShell.java:688)
at groovy.lang.GroovyShell.parse(GroovyShell.java:700)
at 

Re: Beam SQL Improvements

2018-06-04 Thread Lukasz Cwik
Shouldn't the runner isolate each instance of the pipeline behind an
appropriate class loader?

On Sun, Jun 3, 2018 at 12:45 PM Reuven Lax  wrote:

> Just an update: Romain and I chatted on Slack, and I think I understand
> his concern. The concern wasn't specifically about schemas, rather about
> having a generic way to register per-ParDo state that has worker lifetime.
> As evidence that such is needed, in many cases static variables are used to
> simiulate that. static variables however have downsides - if two pipelines
> are run on the same JVM (happens often with unit tests, and there's nothing
> that prevents a runner from doing so in a production environment), these
> static variables will interfere with each other.
>
> On Thu, May 24, 2018 at 12:30 AM Reuven Lax  wrote:
>
>> Romain, maybe it would be useful for us to find some time on slack. I'd
>> like to understand your concerns. Also keep in mind that I'm tagging all
>> these classes as Experimental for now, so we can definitely change these
>> interfaces around if we decide they are not the best ones.
>>
>> Reuven
>>
>> On Tue, May 22, 2018 at 11:35 PM Romain Manni-Bucau <
>> rmannibu...@gmail.com> wrote:
>>
>>> Why not extending ProcessContext to add the new remapped output? But
>>> looks good (the part i dont like is that creating a new context each time a
>>> new feature is added is hurting users. What when beam will add some
>>> reactive support? ReactiveOutputReceiver?)
>>>
>>> Pipeline sounds the wrong storage since once distributed you serialized
>>> the instances so kind of broke the lifecycle of the original instance and
>>> have no real release/close hook on them anymore right? Not sure we can do
>>> better than dofn/source embedded instances today.
>>>
>>>
>>>
>>>
>>> Le mer. 23 mai 2018 08:02, Romain Manni-Bucau  a
>>> écrit :
>>>


 Le mer. 23 mai 2018 07:55, Jean-Baptiste Onofré  a
 écrit :

> Hi,
>
> IMHO, it would be better to have a explicit transform/IO as converter.
>
> It would be easier for users.
>
> Another option would be to use a "TypeConverter/SchemaConverter" map as
> we do in Camel: Beam could check the source/destination "type" and
> check
> in the map if there's a converter available. This map can be store as
> part of the pipeline (as we do for filesystem registration).
>


 It works in camel because it is not strongly typed, isnt it? So can
 require a beam new pipeline api.

 +1 for the explicit transform, if added to the pipeline api as coder it
 wouldnt break the fluent api:

 p.apply(io).setOutputType(Foo.class)

 Coders can be a workaround since they owns the type but since the
 pcollection is the real owner it is surely saner this way, no?

 Also it needs to ensure all converters are present before running the
 pipeline probably, no implicit environment converter support is probably
 good to start to avoid late surprises.



> My $0.01
>
> Regards
> JB
>
> On 23/05/2018 07:51, Romain Manni-Bucau wrote:
> > How does it work on the pipeline side?
> > Do you generate these "virtual" IO at build time to enable the fluent
> > API to work not erasing generics?
> >
> > ex: SQL(row)->BigQuery(native) will not compile so we need a
> > SQL(row)->BigQuery(row)
> >
> > Side note unrelated to Row: if you add another registry maybe a
> pretask
> > is to ensure beam has a kind of singleton/context to avoid to
> duplicate
> > it or not track it properly. These kind of converters will need a
> global
> > close and not only per record in general:
> > converter.init();converter.convert(row);converter.destroy();,
> > otherwise it easily leaks. This is why it can require some way to not
> > recreate it. A quick fix, if you are in bytebuddy already, can be to
> add
> > it to setup/teardown pby, being more global would be nicer but is
> more
> > challenging.
> >
> > Romain Manni-Bucau
> > @rmannibucau  |  Blog
> >  | Old Blog
> >  | Github
> >  | LinkedIn
> >  | Book
> > <
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> >
> >
> >
> > Le mer. 23 mai 2018 à 07:22, Reuven Lax  > > a écrit :
> >
> > No - the only modules we need to add to core are the ones we
> choose
> > to add. For example, I will probably add a registration for
> > TableRow/TableSchema (GCP BigQuery) so these can work seamlessly
> > with schemas. However I will add that to the GCP module, so only
> > someone depending on that module need to pull in that dependency.
> > The 

Jenkins build is back to normal : beam_SeedJob #1867

2018-06-04 Thread Apache Jenkins Server
See 



Build failed in Jenkins: beam_SeedJob #1866

2018-06-04 Thread Apache Jenkins Server
See 

--
GitHub pull request #5406 of commit f4f753f9fe1195cf499f68339da9394eef8deb34, 
no merge conflicts.
Setting status of f4f753f9fe1195cf499f68339da9394eef8deb34 to PENDING with url 
https://builds.apache.org/job/beam_SeedJob/1866/ and message: 'Build started 
sha1 is merged.'
Using context: Jenkins: Seed Job
[EnvInject] - Loading node environment variables.
Building remotely on beam14 (beam) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/beam.git # timeout=10
Fetching upstream changes from https://github.com/apache/beam.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/beam.git 
 > +refs/heads/*:refs/remotes/origin/* 
 > +refs/pull/5406/*:refs/remotes/origin/pr/5406/*
 > git rev-parse refs/remotes/origin/pr/5406/merge^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/pr/5406/merge^{commit} # timeout=10
Checking out Revision d699cda55f223f7ca8bc557d124d254b953489cd 
(refs/remotes/origin/pr/5406/merge)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d699cda55f223f7ca8bc557d124d254b953489cd
Commit message: "Merge f4f753f9fe1195cf499f68339da9394eef8deb34 into 
697a1d17e473cd5b097aaaeee24c08f43cc77f58"
First time build. Skipping changelog.
Cleaning workspace
 > git rev-parse --verify HEAD # timeout=10
Resetting working tree
 > git reset --hard # timeout=10
 > git clean -fdx # timeout=10
Processing DSL script job_00_seed.groovy
Processing DSL script job_Dependency_Check.groovy
ERROR: startup failed:
workspace:/.test-infra/jenkins/dependency_check_utils.groovy: 22: unable to 
resolve class com.google.cloud.bigquery.BigQuery
 @ line 22, column 1.
   import com.google.cloud.bigquery.BigQuery;
   ^

1 error




Re: [ANNOUNCEMENT] New committers, May 2018 edition!

2018-06-04 Thread Mikhail Gryzykhin
Congratulations!

--Mikhail



On Fri, Jun 1, 2018 at 11:34 AM Huygaa Batsaikhan  wrote:

> Congrats!
>
> On Fri, Jun 1, 2018 at 10:26 AM Thomas Groh  wrote:
>
>> Congrats, you three!
>>
>> On Thu, May 31, 2018 at 7:09 PM Davor Bonaci  wrote:
>>
>>> Please join me and the rest of Beam PMC in welcoming the following
>>> contributors as our newest committers. They have significantly contributed
>>> to the project in different ways, and we look forward to many more
>>> contributions in the future.
>>>
>>> * Griselda Cuevas
>>> * Pablo Estrada
>>> * Jason Kuster
>>>
>>> (Apologizes for a delayed announcement, and the lack of the usual
>>> paragraph summarizing individual contributions.)
>>>
>>> Congratulations to all three! Welcome!
>>>
>>


Re: [Proposal] Automation For Beam Dependency Check

2018-06-04 Thread Kenneth Knowles
This kind of leaking analysis that `mvn dependency:analyze` does is I think
what is also called IWYU (Include What You Use). I looked around and there
are some gradle plugins to do the same thing. I couldn't tell which was the
most robust.

Kenn

On Mon, Jun 4, 2018 at 9:46 AM Chamikara Jayalath 
wrote:

>
>
> On Mon, Jun 4, 2018 at 6:10 AM Ismaël Mejía  wrote:
>
>> Is there a way to add to that weekly report the new dependencies that
>> were introduced in the week before, or that have changed?
>>
>
> I think it makes sense to add a recent changes section so that community
> is up to date and can discuss if there are any possible issues. For
> example, (1) new dependencies with known critical vulnerabilities (2)
> component level dependency version overrides that can be avoided.
>
>
>>
>> We are not addressing another important problem: Leaking of
>> dependencies. I am not aware of the gradle equivalent of the maven
>> dependency plugin that helps to determine missing dependencies (non
>> explicitly defined) or unused dependencies. Is there any way to
>> achieve this too? (Note this should probably be enforced at Jenkins
>> not part of the report but just curious)
>>
>
> Agree that this probably be enforced through possibly PreCommit Jenkins
> job instead of the job proposed here.
>
> Regarding leaking, did you mean cross-component leaks (one Beam component
> leaking a dependency to another Beam component) or something else ? For
> cross-component dependency leaks, following proposal promotes using
> versions defined at the top level which will help avoid this issue.
>
> https://docs.google.com/document/d/15m1MziZ5TNd9rh_XN0YYBJfYkt0Oj-Ou9g0KFDPL2aA/edit?usp=sharing
>
> Thanks,
> Cham
>
>
>>
>> On Wed, May 30, 2018 at 5:16 AM Yifan Zou  wrote:
>> >
>> > Thanks everyone for making comments and suggestions. I modified the
>> proposal that added dependency release time as the major criteria for
>> outdated package determination.
>> > The revised doc is here:
>> https://docs.google.com/document/d/1rqr_8a9NYZCgeiXpTIwWLCL7X8amPAVfRXsO72BpBwA.
>> Any comments are welcome.
>> >
>> > -Yifan
>> >
>> > On Thu, May 24, 2018 at 5:25 PM Chamikara Jayalath <
>> chamik...@google.com> wrote:
>> >>
>> >> Thanks Yifan. Added some comments. I think having regularly generated
>> human reports on outdated decencies of Beam SDKs will be extremely helpful
>> in keeping Beam in a healthy state.
>> >>
>> >> - Cham
>> >>
>> >> On Thu, May 24, 2018 at 7:08 AM Yifan Zou  wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I have a proposal to automate Beam dependency check. Since some Beam
>> dependent packages are out-of-date, we want to identify them and check for
>> dependency updates regularly in the future. Generally, we have couple
>> options to do it:
>> >>> 1. Implementing a Jenkins job that check dependency versions and
>> create reports.
>> >>> 2. Using the Github App Dependabot to automate dependency updates.
>> >>> 3. Combination of those two solutions.
>> >>>
>> >>> I am looking forward to hearing feedback from you :)
>> >>>
>> >>>
>> https://docs.google.com/document/d/1rqr_8a9NYZCgeiXpTIwWLCL7X8amPAVfRXsO72BpBwA/
>> >>>
>> >>> Thanks.
>> >>>
>> >>> Best.
>> >>> Yifan Zou
>>
>


Re: [Proposal] Automation For Beam Dependency Check

2018-06-04 Thread Chamikara Jayalath
On Mon, Jun 4, 2018 at 6:10 AM Ismaël Mejía  wrote:

> Is there a way to add to that weekly report the new dependencies that
> were introduced in the week before, or that have changed?
>

I think it makes sense to add a recent changes section so that community is
up to date and can discuss if there are any possible issues. For example,
(1) new dependencies with known critical vulnerabilities (2) component
level dependency version overrides that can be avoided.


>
> We are not addressing another important problem: Leaking of
> dependencies. I am not aware of the gradle equivalent of the maven
> dependency plugin that helps to determine missing dependencies (non
> explicitly defined) or unused dependencies. Is there any way to
> achieve this too? (Note this should probably be enforced at Jenkins
> not part of the report but just curious)
>

Agree that this probably be enforced through possibly PreCommit Jenkins job
instead of the job proposed here.

Regarding leaking, did you mean cross-component leaks (one Beam component
leaking a dependency to another Beam component) or something else ? For
cross-component dependency leaks, following proposal promotes using
versions defined at the top level which will help avoid this issue.
https://docs.google.com/document/d/15m1MziZ5TNd9rh_XN0YYBJfYkt0Oj-Ou9g0KFDPL2aA/edit?usp=sharing

Thanks,
Cham


>
> On Wed, May 30, 2018 at 5:16 AM Yifan Zou  wrote:
> >
> > Thanks everyone for making comments and suggestions. I modified the
> proposal that added dependency release time as the major criteria for
> outdated package determination.
> > The revised doc is here:
> https://docs.google.com/document/d/1rqr_8a9NYZCgeiXpTIwWLCL7X8amPAVfRXsO72BpBwA.
> Any comments are welcome.
> >
> > -Yifan
> >
> > On Thu, May 24, 2018 at 5:25 PM Chamikara Jayalath 
> wrote:
> >>
> >> Thanks Yifan. Added some comments. I think having regularly generated
> human reports on outdated decencies of Beam SDKs will be extremely helpful
> in keeping Beam in a healthy state.
> >>
> >> - Cham
> >>
> >> On Thu, May 24, 2018 at 7:08 AM Yifan Zou  wrote:
> >>>
> >>> Hello,
> >>>
> >>> I have a proposal to automate Beam dependency check. Since some Beam
> dependent packages are out-of-date, we want to identify them and check for
> dependency updates regularly in the future. Generally, we have couple
> options to do it:
> >>> 1. Implementing a Jenkins job that check dependency versions and
> create reports.
> >>> 2. Using the Github App Dependabot to automate dependency updates.
> >>> 3. Combination of those two solutions.
> >>>
> >>> I am looking forward to hearing feedback from you :)
> >>>
> >>>
> https://docs.google.com/document/d/1rqr_8a9NYZCgeiXpTIwWLCL7X8amPAVfRXsO72BpBwA/
> >>>
> >>> Thanks.
> >>>
> >>> Best.
> >>> Yifan Zou
>


Re: Proposal: keeping post-commit tests green

2018-06-04 Thread Mikhail Gryzykhin
Hello everyone,

I have addressed comments on the proposal doc and updated it accordingly. I
have also added section on metrics that we want to track for pre-commit
tests and contents for dashboard.

Please, take a second look at the document.

Highlights:
* Sections that I feel require more discussion are marked with *[More
opinions wanted]*
** I've kept original comments open for this iteration. Please, close them
if you feel those resolved, or elaborate more on the topic.*
* Added information on metrics to track
* Moved “Split test jobs into automatically and manually triggered” to
“Other ideas to consider”
* Prioritized automated JIRA ticket creation over manual
* Prioritized roll-back first policy
* Added process for enforcing proposed policies.

--Mikhail

Have feedback ?


On Tue, May 22, 2018 at 10:11 AM Scott Wegner  wrote:

> Thanks for the thoughtful proposal Mikhail. I've left some comments in the
> doc.
>
> I encourage others to take a look: the proposal adds some strong policies
> about dealing with post-commit failures (rollback policy, locking master).
> Currently our post-commits are frequently red, and we're missing out on a
> valuable quality signal. I'm in favor of such policies to help get the test
> signals back to a healthy state.
>
> On Mon, May 21, 2018 at 2:48 PM Mikhail Gryzykhin 
> wrote:
>
>> Hi Everyone,
>>
>> I've updated design doc according to comments.
>>
>> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>>
>> In general, ideas proposed seem to be appreciated. Still, some of
>> sections require more discussion.
>>
>> Changes highlight:
>> * Added roll-back first policy to best practices. This includes process
>> on how to handle roll-back.
>> * Marked topics that I'd like to have more input on. [cyan color]
>>
>> --Mikhail
>>
>> Have feedback ?
>>
>>
>> On Fri, May 18, 2018 at 10:56 AM Andrew Pilloud 
>> wrote:
>>
>>> Blocking commits to master on test flaps seems critical here. The test
>>> flaps won't get the attention they deserve as long as people are just
>>> spamming their PRs with 'Run Java Precommit' until they turn green. I'm
>>> guilty of this behavior and I know it masks new flaky tests.
>>>
>>> I added a comment to your doc about detecting flaky tests. This can
>>> easily be done by rerunning the postcommits during times when Jenkins would
>>> otherwise be idle. You'll easily get a few dozen runs every weekend, you
>>> just need a process to triage all the flakes and ensure there are bugs. I
>>> worked on a project that did this along with blocking master on any post
>>> commit failure. It was painful for the first few weeks, but things got
>>> significantly better once most of the bugs were fixed.
>>>
>>> Andrew
>>>
>>> On Fri, May 18, 2018 at 10:39 AM Kenneth Knowles  wrote:
>>>
 Love it. I would pull out from the doc also the key point: make the
 postcommit status constantly visible to everyone.

 Kenn

 On Fri, May 18, 2018 at 10:17 AM Mikhail Gryzykhin 
 wrote:

> Hi everyone,
>
> I'm Mikhail and started working on Google Dataflow several months ago.
> I'm really excited to work with Beam opensource community.
>
> I have a proposal to improve contributor experience by keeping
> post-commit tests green.
>
> I'm looking to get community consensus and approval about the process
> for keeping post-commit tests green and addressing post-commit test
> failures.
>
> Find full list of ideas brought in for discussion in this document:
>
> https://docs.google.com/document/d/1sczGwnCvdHiboVajGVdnZL0rfnr7ViXXAebBAf_uQME
>
> Key points are:
> 1. Add explicit tracking of failures via JIRA
> 2. No-Commit policy when post-commit tests are red
>
> --Mikhail
>
>


Re: Go SDK Example=

2018-06-04 Thread Henning Rohde
Welcome James!

Awesome that you're interested in contributing to Apache Beam! If you're
specifically interested in the Go SDK, the task you identified is a good
one to start with. I assigned it to you. I also added a few similar tasks
listed below as alternatives. Feel free to pick the one you prefer and
re-assign as appropriate (or I can do it for you). It's best that the JIRAs
are assigned before any work is done so avoid accidental duplication.

  BEAM-4466 Add Go TF-IDF example
  BEAM-4467 Add Go Autocomplete example

The main caveat for Go streaming pipelines is that they currently only
really work on Dataflow, because the only streaming IO connector is PubSub
and the direct runner supports batch only. In the near future, however, the
ULR and Flink runner will support portable streaming pipelines, including
Go. If it is too impractical to work with the IO used by corresponding
Java/Python examples, feel free to deviate by using textio or similar
instead. There may also be incomplete feature work in Go that prevents an
direct translation.

Please feel to ask questions in the JIRAs or on the dev list. Happy to help!

Henning



On Sun, Jun 3, 2018 at 6:41 PM Kenneth Knowles  wrote:

> Hi James,
>
> Welcome!
>
> Have you subscribed to dev@beam.apache.org? I am including that list
> here, since that is the most active list for discussing contributions. I've
> also included Henning explicitly. He is the best person to answer.
>
> I found your JIRA account and set up permissions so you can be assigned
> issues.
>
> Kenn
>
> On Sun, Jun 3, 2018 at 12:35 PM James Wilson  wrote:
>
>> Hi All,
>>
>> This is first time I am trying to contribute to a large open source
>> project.  I was going to tackle the BEAM-4292 "Add streaming word count
>> example" for the Go SDK.  Do I assign it to myself or just complete the
>> task and create a PR request?  I read through the contributing page on the
>> Apache Beam site, but it didn’t go into how to tackle your first task.  Any
>> help would be appreciated.
>>
>> Best,
>> James
>
>


Re: [VOTE] Code Review Process

2018-06-04 Thread Raghu Angadi
+1

On Fri, Jun 1, 2018 at 10:25 AM Thomas Groh  wrote:

> As we seem to largely have consensus in "Reducing Committer Load for Code
> Reviews"[1], this is a vote to change the Beam policy on Code Reviews to
> require that
>
> (1) At least one committer is involved with the code review, as either a
> reviewer or as the author
> (2) A contributor has approved the change
>
> prior to merging any change.
>
> This changes our policy from its current requirement that at least one
> committer *who is not the author* has approved the change prior to merging.
> We believe that changing this process will improve code review throughput,
> reduce committer load, and engage more of the community in the code review
> process.
>
> Please vote:
> [ ] +1: Accept the above proposal to change the Beam code review/merge
> policy
> [ ] -1: Leave the Code Review policy unchanged
>
> Thanks,
>
> Thomas
>
> [1]
> https://lists.apache.org/thread.html/7c1fde3884fbefacc252b6d4b434f9a9c2cf024f381654aa3e47df18@%3Cdev.beam.apache.org%3E
>


Re: Some extensions to the DoFn API

2018-06-04 Thread Jean-Baptiste Onofré
Thanks ! I will work on this one then ;)

Regards
JB

On 04/06/2018 16:55, Reuven Lax wrote:
> I'll file a JIRA to track the idea.
> 
> On Mon, Jun 4, 2018 at 5:52 PM Jean-Baptiste Onofré  > wrote:
> 
> Exactly, that's why something like @xpath or @json-path could be
> interesting.
> 
> Regards
> JB
> 
> On 04/06/2018 16:48, Reuven Lax wrote:
> > Interesting. And given that Beam Schemas are recursive (a row can
> > contain nested rows), we might actually need something like xpath
> if we
> > want to make this fully general.
> >
> > Reuven
> >
> > On Mon, Jun 4, 2018 at 5:45 PM Jean-Baptiste Onofré
> mailto:j...@nanthrax.net>
> > >> wrote:
> >
> >     Yup, it makes sense, it's what I had in mind.
> >
> >     In Apache Camel, in a Processor (similar to a DoFn), we can
> also pass
> >     directly languages to the arguments.
> >
> >     We can imagine something like:
> >
> >     @ProcessElement void process(@json-path("foo") String foo)
> >
> >     @ProcessElement void process(@xpath("//foo") String foo)
> >
> >     or even a expression language (simple/groovy/whatever).
> >
> >     Regards
> >     JB
> >
> >     On 04/06/2018 16:39, Reuven Lax wrote:
> >     > In the schema branch I have already added some annotations
> for Schema.
> >     > However in the future I think we could go even further and
> allow users
> >     > to pick individual fields out of the row schema. e.g. the
> user might
> >     > have a Schema with 100 fields, but only want to process
> userId and geo
> >     > location. I could imagine something like this
> >     >
> >     > @ProcessElement void process(@Field("userId") String
> >     > userId, @Field("latitude") double lat, @Field("longitude")
> double
> >     long) {
> >     > }
> >     >
> >     > And Beam could automatically extract the right fields for
> the user. In
> >     > fact we could do the same thing with KVs today - supplying
> annotations
> >     > to automatically unpack the KV.
> >     >
> >     > I do think there are a few nice ways to do side inputs as well,
> >     but it's
> >     > more work to design implement which is why I left it off (and
> >     given that
> >     > there is some design work, side input annotations should be
> >     discussed on
> >     > the dev list before implementation IMO).
> >     >
> >     > Reuven
> >     >
> >     > On Mon, Jun 4, 2018 at 5:29 PM Jean-Baptiste Onofré
> >     mailto:j...@nanthrax.net>
> >
> >     > 
>  >     >
> >     >     Hi Reuven,
> >     >
> >     >     That's a great improvement for user.
> >     >
> >     >     I don't see an easy way to have annotation about side
> >     input/output.
> >     >     I think we can also plan some extension annotation about
> >     schema. Like
> >     >     @Element(schema = foo) in addition of the type. Thoughts ?
> >     >
> >     >     Regards
> >     >     JB
> >     >
> >     >     On 04/06/2018 16:06, Reuven Lax wrote:
> >     >     > Beam was created with an annotation-based processing API,
> >     that allows
> >     >     > the framework to automatically inject parameters to a
> DoFn's
> >     process
> >     >     > method (and also allows the user to mark any method as the
> >     process
> >     >     > method using @ProcessElement). However, these annotations
> >     were never
> >     >     > completed. A specific set of parameters could be injected
> >     (e.g. the
> >     >     > window or PipelineOptions), but for anything else you
> had to
> >     access it
> >     >     > through the ProcessContext. This limited the readability
> >     advantage of
> >     >     > this API.
> >     >     >
> >     >     > A couple of months ago I spent a bit of time extending the
> >     set of
> >     >     > annotations allowed. In particular, the most common
> uses of
> >     >     > ProcessContext were accessing the input element and
> outputting
> >     >     elements,
> >     >     > and both of those can now be done without ProcessContext.
> >     Example
> >     >     usage:
> >     >     >
> >     >     > new DoFn() {
> >     >     >   @ProcessElement process(@Element InputT element,
> >     >     > OutputReceiver out) {
> >     >     >     out.output(convertInputToOutput(element));
> >     >     >   }
> >     >     > }
> >     >     >
> >   

Re: Some extensions to the DoFn API

2018-06-04 Thread Reuven Lax
I'll file a JIRA to track the idea.

On Mon, Jun 4, 2018 at 5:52 PM Jean-Baptiste Onofré  wrote:

> Exactly, that's why something like @xpath or @json-path could be
> interesting.
>
> Regards
> JB
>
> On 04/06/2018 16:48, Reuven Lax wrote:
> > Interesting. And given that Beam Schemas are recursive (a row can
> > contain nested rows), we might actually need something like xpath if we
> > want to make this fully general.
> >
> > Reuven
> >
> > On Mon, Jun 4, 2018 at 5:45 PM Jean-Baptiste Onofré  > > wrote:
> >
> > Yup, it makes sense, it's what I had in mind.
> >
> > In Apache Camel, in a Processor (similar to a DoFn), we can also pass
> > directly languages to the arguments.
> >
> > We can imagine something like:
> >
> > @ProcessElement void process(@json-path("foo") String foo)
> >
> > @ProcessElement void process(@xpath("//foo") String foo)
> >
> > or even a expression language (simple/groovy/whatever).
> >
> > Regards
> > JB
> >
> > On 04/06/2018 16:39, Reuven Lax wrote:
> > > In the schema branch I have already added some annotations for
> Schema.
> > > However in the future I think we could go even further and allow
> users
> > > to pick individual fields out of the row schema. e.g. the user
> might
> > > have a Schema with 100 fields, but only want to process userId and
> geo
> > > location. I could imagine something like this
> > >
> > > @ProcessElement void process(@Field("userId") String
> > > userId, @Field("latitude") double lat, @Field("longitude") double
> > long) {
> > > }
> > >
> > > And Beam could automatically extract the right fields for the
> user. In
> > > fact we could do the same thing with KVs today - supplying
> annotations
> > > to automatically unpack the KV.
> > >
> > > I do think there are a few nice ways to do side inputs as well,
> > but it's
> > > more work to design implement which is why I left it off (and
> > given that
> > > there is some design work, side input annotations should be
> > discussed on
> > > the dev list before implementation IMO).
> > >
> > > Reuven
> > >
> > > On Mon, Jun 4, 2018 at 5:29 PM Jean-Baptiste Onofré
> > mailto:j...@nanthrax.net>
> > > >> wrote:
> > >
> > > Hi Reuven,
> > >
> > > That's a great improvement for user.
> > >
> > > I don't see an easy way to have annotation about side
> > input/output.
> > > I think we can also plan some extension annotation about
> > schema. Like
> > > @Element(schema = foo) in addition of the type. Thoughts ?
> > >
> > > Regards
> > > JB
> > >
> > > On 04/06/2018 16:06, Reuven Lax wrote:
> > > > Beam was created with an annotation-based processing API,
> > that allows
> > > > the framework to automatically inject parameters to a DoFn's
> > process
> > > > method (and also allows the user to mark any method as the
> > process
> > > > method using @ProcessElement). However, these annotations
> > were never
> > > > completed. A specific set of parameters could be injected
> > (e.g. the
> > > > window or PipelineOptions), but for anything else you had to
> > access it
> > > > through the ProcessContext. This limited the readability
> > advantage of
> > > > this API.
> > > >
> > > > A couple of months ago I spent a bit of time extending the
> > set of
> > > > annotations allowed. In particular, the most common uses of
> > > > ProcessContext were accessing the input element and
> outputting
> > > elements,
> > > > and both of those can now be done without ProcessContext.
> > Example
> > > usage:
> > > >
> > > > new DoFn() {
> > > >   @ProcessElement process(@Element InputT element,
> > > > OutputReceiver out) {
> > > > out.output(convertInputToOutput(element));
> > > >   }
> > > > }
> > > >
> > > > No need for ProcessContext anywhere in this DoFn! The Beam
> > framework
> > > > also does type checking - if the @Element type was not
> > InputT, you
> > > would
> > > > have seen an error. Multi-output DoFns also work, using a
> > > > MultiOutputReceiver interface.
> > > >
> > > > I'll update the Beam docs later with this information, but
> most
> > > > information accessible from ProcessContext, OnTimerContext,
> > > > StartBundleContext, or FinishBundleContext can now be
> > accessed via
> > > this
> > > > sort of injection. The main exceptions are side inputs and
> > output from
> > > > finishbundle, both of which still require the context
> objects;
> > > however I

Re: Some extensions to the DoFn API

2018-06-04 Thread Jean-Baptiste Onofré
Exactly, that's why something like @xpath or @json-path could be
interesting.

Regards
JB

On 04/06/2018 16:48, Reuven Lax wrote:
> Interesting. And given that Beam Schemas are recursive (a row can
> contain nested rows), we might actually need something like xpath if we
> want to make this fully general.
> 
> Reuven
> 
> On Mon, Jun 4, 2018 at 5:45 PM Jean-Baptiste Onofré  > wrote:
> 
> Yup, it makes sense, it's what I had in mind.
> 
> In Apache Camel, in a Processor (similar to a DoFn), we can also pass
> directly languages to the arguments.
> 
> We can imagine something like:
> 
> @ProcessElement void process(@json-path("foo") String foo)
> 
> @ProcessElement void process(@xpath("//foo") String foo)
> 
> or even a expression language (simple/groovy/whatever).
> 
> Regards
> JB
> 
> On 04/06/2018 16:39, Reuven Lax wrote:
> > In the schema branch I have already added some annotations for Schema.
> > However in the future I think we could go even further and allow users
> > to pick individual fields out of the row schema. e.g. the user might
> > have a Schema with 100 fields, but only want to process userId and geo
> > location. I could imagine something like this
> >
> > @ProcessElement void process(@Field("userId") String
> > userId, @Field("latitude") double lat, @Field("longitude") double
> long) {
> > }
> >
> > And Beam could automatically extract the right fields for the user. In
> > fact we could do the same thing with KVs today - supplying annotations
> > to automatically unpack the KV.
> >
> > I do think there are a few nice ways to do side inputs as well,
> but it's
> > more work to design implement which is why I left it off (and
> given that
> > there is some design work, side input annotations should be
> discussed on
> > the dev list before implementation IMO).
> >
> > Reuven
> >
> > On Mon, Jun 4, 2018 at 5:29 PM Jean-Baptiste Onofré
> mailto:j...@nanthrax.net>
> > >> wrote:
> >
> >     Hi Reuven,
> >
> >     That's a great improvement for user.
> >
> >     I don't see an easy way to have annotation about side
> input/output.
> >     I think we can also plan some extension annotation about
> schema. Like
> >     @Element(schema = foo) in addition of the type. Thoughts ?
> >
> >     Regards
> >     JB
> >
> >     On 04/06/2018 16:06, Reuven Lax wrote:
> >     > Beam was created with an annotation-based processing API,
> that allows
> >     > the framework to automatically inject parameters to a DoFn's
> process
> >     > method (and also allows the user to mark any method as the
> process
> >     > method using @ProcessElement). However, these annotations
> were never
> >     > completed. A specific set of parameters could be injected
> (e.g. the
> >     > window or PipelineOptions), but for anything else you had to
> access it
> >     > through the ProcessContext. This limited the readability
> advantage of
> >     > this API.
> >     >
> >     > A couple of months ago I spent a bit of time extending the
> set of
> >     > annotations allowed. In particular, the most common uses of
> >     > ProcessContext were accessing the input element and outputting
> >     elements,
> >     > and both of those can now be done without ProcessContext.
> Example
> >     usage:
> >     >
> >     > new DoFn() {
> >     >   @ProcessElement process(@Element InputT element,
> >     > OutputReceiver out) {
> >     >     out.output(convertInputToOutput(element));
> >     >   }
> >     > }
> >     >
> >     > No need for ProcessContext anywhere in this DoFn! The Beam
> framework
> >     > also does type checking - if the @Element type was not
> InputT, you
> >     would
> >     > have seen an error. Multi-output DoFns also work, using a
> >     > MultiOutputReceiver interface.
> >     >
> >     > I'll update the Beam docs later with this information, but most
> >     > information accessible from ProcessContext, OnTimerContext,
> >     > StartBundleContext, or FinishBundleContext can now be
> accessed via
> >     this
> >     > sort of injection. The main exceptions are side inputs and
> output from
> >     > finishbundle, both of which still require the context objects;
> >     however I
> >     > hope to find time to provide direct access to those as well.
> >     >
> >     > pr/5331 (in progress) converts most of Beam's built-in
> transforms
> >     to use
> >     > this clearer style.
> >     >
> >     > Reuven
> >
> >     --
> >     Jean-Baptiste Onofré
> >     jbono...@apache.org 

Re: Some extensions to the DoFn API

2018-06-04 Thread Reuven Lax
Interesting. And given that Beam Schemas are recursive (a row can contain
nested rows), we might actually need something like xpath if we want to
make this fully general.

Reuven

On Mon, Jun 4, 2018 at 5:45 PM Jean-Baptiste Onofré  wrote:

> Yup, it makes sense, it's what I had in mind.
>
> In Apache Camel, in a Processor (similar to a DoFn), we can also pass
> directly languages to the arguments.
>
> We can imagine something like:
>
> @ProcessElement void process(@json-path("foo") String foo)
>
> @ProcessElement void process(@xpath("//foo") String foo)
>
> or even a expression language (simple/groovy/whatever).
>
> Regards
> JB
>
> On 04/06/2018 16:39, Reuven Lax wrote:
> > In the schema branch I have already added some annotations for Schema.
> > However in the future I think we could go even further and allow users
> > to pick individual fields out of the row schema. e.g. the user might
> > have a Schema with 100 fields, but only want to process userId and geo
> > location. I could imagine something like this
> >
> > @ProcessElement void process(@Field("userId") String
> > userId, @Field("latitude") double lat, @Field("longitude") double long) {
> > }
> >
> > And Beam could automatically extract the right fields for the user. In
> > fact we could do the same thing with KVs today - supplying annotations
> > to automatically unpack the KV.
> >
> > I do think there are a few nice ways to do side inputs as well, but it's
> > more work to design implement which is why I left it off (and given that
> > there is some design work, side input annotations should be discussed on
> > the dev list before implementation IMO).
> >
> > Reuven
> >
> > On Mon, Jun 4, 2018 at 5:29 PM Jean-Baptiste Onofré  > > wrote:
> >
> > Hi Reuven,
> >
> > That's a great improvement for user.
> >
> > I don't see an easy way to have annotation about side input/output.
> > I think we can also plan some extension annotation about schema. Like
> > @Element(schema = foo) in addition of the type. Thoughts ?
> >
> > Regards
> > JB
> >
> > On 04/06/2018 16:06, Reuven Lax wrote:
> > > Beam was created with an annotation-based processing API, that
> allows
> > > the framework to automatically inject parameters to a DoFn's
> process
> > > method (and also allows the user to mark any method as the process
> > > method using @ProcessElement). However, these annotations were
> never
> > > completed. A specific set of parameters could be injected (e.g. the
> > > window or PipelineOptions), but for anything else you had to
> access it
> > > through the ProcessContext. This limited the readability advantage
> of
> > > this API.
> > >
> > > A couple of months ago I spent a bit of time extending the set of
> > > annotations allowed. In particular, the most common uses of
> > > ProcessContext were accessing the input element and outputting
> > elements,
> > > and both of those can now be done without ProcessContext. Example
> > usage:
> > >
> > > new DoFn() {
> > >   @ProcessElement process(@Element InputT element,
> > > OutputReceiver out) {
> > > out.output(convertInputToOutput(element));
> > >   }
> > > }
> > >
> > > No need for ProcessContext anywhere in this DoFn! The Beam
> framework
> > > also does type checking - if the @Element type was not InputT, you
> > would
> > > have seen an error. Multi-output DoFns also work, using a
> > > MultiOutputReceiver interface.
> > >
> > > I'll update the Beam docs later with this information, but most
> > > information accessible from ProcessContext, OnTimerContext,
> > > StartBundleContext, or FinishBundleContext can now be accessed via
> > this
> > > sort of injection. The main exceptions are side inputs and output
> from
> > > finishbundle, both of which still require the context objects;
> > however I
> > > hope to find time to provide direct access to those as well.
> > >
> > > pr/5331 (in progress) converts most of Beam's built-in transforms
> > to use
> > > this clearer style.
> > >
> > > Reuven
> >
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org 
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Some extensions to the DoFn API

2018-06-04 Thread Jean-Baptiste Onofré
Yup, it makes sense, it's what I had in mind.

In Apache Camel, in a Processor (similar to a DoFn), we can also pass
directly languages to the arguments.

We can imagine something like:

@ProcessElement void process(@json-path("foo") String foo)

@ProcessElement void process(@xpath("//foo") String foo)

or even a expression language (simple/groovy/whatever).

Regards
JB

On 04/06/2018 16:39, Reuven Lax wrote:
> In the schema branch I have already added some annotations for Schema.
> However in the future I think we could go even further and allow users
> to pick individual fields out of the row schema. e.g. the user might
> have a Schema with 100 fields, but only want to process userId and geo
> location. I could imagine something like this
> 
> @ProcessElement void process(@Field("userId") String
> userId, @Field("latitude") double lat, @Field("longitude") double long) {
> }
> 
> And Beam could automatically extract the right fields for the user. In
> fact we could do the same thing with KVs today - supplying annotations
> to automatically unpack the KV.
> 
> I do think there are a few nice ways to do side inputs as well, but it's
> more work to design implement which is why I left it off (and given that
> there is some design work, side input annotations should be discussed on
> the dev list before implementation IMO).
> 
> Reuven
> 
> On Mon, Jun 4, 2018 at 5:29 PM Jean-Baptiste Onofré  > wrote:
> 
> Hi Reuven,
> 
> That's a great improvement for user.
> 
> I don't see an easy way to have annotation about side input/output.
> I think we can also plan some extension annotation about schema. Like
> @Element(schema = foo) in addition of the type. Thoughts ?
> 
> Regards
> JB
> 
> On 04/06/2018 16:06, Reuven Lax wrote:
> > Beam was created with an annotation-based processing API, that allows
> > the framework to automatically inject parameters to a DoFn's process
> > method (and also allows the user to mark any method as the process
> > method using @ProcessElement). However, these annotations were never
> > completed. A specific set of parameters could be injected (e.g. the
> > window or PipelineOptions), but for anything else you had to access it
> > through the ProcessContext. This limited the readability advantage of
> > this API.
> >
> > A couple of months ago I spent a bit of time extending the set of
> > annotations allowed. In particular, the most common uses of
> > ProcessContext were accessing the input element and outputting
> elements,
> > and both of those can now be done without ProcessContext. Example
> usage:
> >
> > new DoFn() {
> >   @ProcessElement process(@Element InputT element,
> > OutputReceiver out) {
> >     out.output(convertInputToOutput(element));
> >   }
> > }
> >
> > No need for ProcessContext anywhere in this DoFn! The Beam framework
> > also does type checking - if the @Element type was not InputT, you
> would
> > have seen an error. Multi-output DoFns also work, using a
> > MultiOutputReceiver interface.
> >
> > I'll update the Beam docs later with this information, but most
> > information accessible from ProcessContext, OnTimerContext,
> > StartBundleContext, or FinishBundleContext can now be accessed via
> this
> > sort of injection. The main exceptions are side inputs and output from
> > finishbundle, both of which still require the context objects;
> however I
> > hope to find time to provide direct access to those as well.
> >
> > pr/5331 (in progress) converts most of Beam's built-in transforms
> to use
> > this clearer style.
> >
> > Reuven
> 
> -- 
> Jean-Baptiste Onofré
> jbono...@apache.org 
> http://blog.nanthrax.net
> Talend - http://www.talend.com
> 

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: Some extensions to the DoFn API

2018-06-04 Thread Reuven Lax
In the schema branch I have already added some annotations for Schema.
However in the future I think we could go even further and allow users to
pick individual fields out of the row schema. e.g. the user might have a
Schema with 100 fields, but only want to process userId and geo location. I
could imagine something like this

@ProcessElement void process(@Field("userId") String
userId, @Field("latitude") double lat, @Field("longitude") double long) {
}

And Beam could automatically extract the right fields for the user. In fact
we could do the same thing with KVs today - supplying annotations to
automatically unpack the KV.

I do think there are a few nice ways to do side inputs as well, but it's
more work to design implement which is why I left it off (and given that
there is some design work, side input annotations should be discussed on
the dev list before implementation IMO).

Reuven

On Mon, Jun 4, 2018 at 5:29 PM Jean-Baptiste Onofré  wrote:

> Hi Reuven,
>
> That's a great improvement for user.
>
> I don't see an easy way to have annotation about side input/output.
> I think we can also plan some extension annotation about schema. Like
> @Element(schema = foo) in addition of the type. Thoughts ?
>
> Regards
> JB
>
> On 04/06/2018 16:06, Reuven Lax wrote:
> > Beam was created with an annotation-based processing API, that allows
> > the framework to automatically inject parameters to a DoFn's process
> > method (and also allows the user to mark any method as the process
> > method using @ProcessElement). However, these annotations were never
> > completed. A specific set of parameters could be injected (e.g. the
> > window or PipelineOptions), but for anything else you had to access it
> > through the ProcessContext. This limited the readability advantage of
> > this API.
> >
> > A couple of months ago I spent a bit of time extending the set of
> > annotations allowed. In particular, the most common uses of
> > ProcessContext were accessing the input element and outputting elements,
> > and both of those can now be done without ProcessContext. Example usage:
> >
> > new DoFn() {
> >   @ProcessElement process(@Element InputT element,
> > OutputReceiver out) {
> > out.output(convertInputToOutput(element));
> >   }
> > }
> >
> > No need for ProcessContext anywhere in this DoFn! The Beam framework
> > also does type checking - if the @Element type was not InputT, you would
> > have seen an error. Multi-output DoFns also work, using a
> > MultiOutputReceiver interface.
> >
> > I'll update the Beam docs later with this information, but most
> > information accessible from ProcessContext, OnTimerContext,
> > StartBundleContext, or FinishBundleContext can now be accessed via this
> > sort of injection. The main exceptions are side inputs and output from
> > finishbundle, both of which still require the context objects; however I
> > hope to find time to provide direct access to those as well.
> >
> > pr/5331 (in progress) converts most of Beam's built-in transforms to use
> > this clearer style.
> >
> > Reuven
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: Some extensions to the DoFn API

2018-06-04 Thread Jean-Baptiste Onofré
Hi Reuven,

That's a great improvement for user.

I don't see an easy way to have annotation about side input/output.
I think we can also plan some extension annotation about schema. Like
@Element(schema = foo) in addition of the type. Thoughts ?

Regards
JB

On 04/06/2018 16:06, Reuven Lax wrote:
> Beam was created with an annotation-based processing API, that allows
> the framework to automatically inject parameters to a DoFn's process
> method (and also allows the user to mark any method as the process
> method using @ProcessElement). However, these annotations were never
> completed. A specific set of parameters could be injected (e.g. the
> window or PipelineOptions), but for anything else you had to access it
> through the ProcessContext. This limited the readability advantage of
> this API.
> 
> A couple of months ago I spent a bit of time extending the set of
> annotations allowed. In particular, the most common uses of
> ProcessContext were accessing the input element and outputting elements,
> and both of those can now be done without ProcessContext. Example usage:
> 
> new DoFn() {
>   @ProcessElement process(@Element InputT element,
> OutputReceiver out) {
>     out.output(convertInputToOutput(element));
>   }
> }
> 
> No need for ProcessContext anywhere in this DoFn! The Beam framework
> also does type checking - if the @Element type was not InputT, you would
> have seen an error. Multi-output DoFns also work, using a
> MultiOutputReceiver interface.
> 
> I'll update the Beam docs later with this information, but most
> information accessible from ProcessContext, OnTimerContext,
> StartBundleContext, or FinishBundleContext can now be accessed via this
> sort of injection. The main exceptions are side inputs and output from
> finishbundle, both of which still require the context objects; however I
> hope to find time to provide direct access to those as well.
> 
> pr/5331 (in progress) converts most of Beam's built-in transforms to use
> this clearer style.
> 
> Reuven

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Some extensions to the DoFn API

2018-06-04 Thread Reuven Lax
Beam was created with an annotation-based processing API, that allows the
framework to automatically inject parameters to a DoFn's process method
(and also allows the user to mark any method as the process method
using @ProcessElement). However, these annotations were never completed. A
specific set of parameters could be injected (e.g. the window or
PipelineOptions), but for anything else you had to access it through the
ProcessContext. This limited the readability advantage of this API.

A couple of months ago I spent a bit of time extending the set of
annotations allowed. In particular, the most common uses of ProcessContext
were accessing the input element and outputting elements, and both of those
can now be done without ProcessContext. Example usage:

new DoFn() {
  @ProcessElement process(@Element InputT element, OutputReceiver
out) {
out.output(convertInputToOutput(element));
  }
}

No need for ProcessContext anywhere in this DoFn! The Beam framework also
does type checking - if the @Element type was not InputT, you would have
seen an error. Multi-output DoFns also work, using a MultiOutputReceiver
interface.

I'll update the Beam docs later with this information, but most information
accessible from ProcessContext, OnTimerContext, StartBundleContext, or
FinishBundleContext can now be accessed via this sort of injection. The
main exceptions are side inputs and output from finishbundle, both of which
still require the context objects; however I hope to find time to provide
direct access to those as well.

pr/5331 (in progress) converts most of Beam's built-in transforms to use
this clearer style.

Reuven


Re: [Proposal] Automation For Beam Dependency Check

2018-06-04 Thread Ismaël Mejía
Is there a way to add to that weekly report the new dependencies that
were introduced in the week before, or that have changed?

We are not addressing another important problem: Leaking of
dependencies. I am not aware of the gradle equivalent of the maven
dependency plugin that helps to determine missing dependencies (non
explicitly defined) or unused dependencies. Is there any way to
achieve this too? (Note this should probably be enforced at Jenkins
not part of the report but just curious)

On Wed, May 30, 2018 at 5:16 AM Yifan Zou  wrote:
>
> Thanks everyone for making comments and suggestions. I modified the proposal 
> that added dependency release time as the major criteria for outdated package 
> determination.
> The revised doc is here: 
> https://docs.google.com/document/d/1rqr_8a9NYZCgeiXpTIwWLCL7X8amPAVfRXsO72BpBwA.
>  Any comments are welcome.
>
> -Yifan
>
> On Thu, May 24, 2018 at 5:25 PM Chamikara Jayalath  
> wrote:
>>
>> Thanks Yifan. Added some comments. I think having regularly generated human 
>> reports on outdated decencies of Beam SDKs will be extremely helpful in 
>> keeping Beam in a healthy state.
>>
>> - Cham
>>
>> On Thu, May 24, 2018 at 7:08 AM Yifan Zou  wrote:
>>>
>>> Hello,
>>>
>>> I have a proposal to automate Beam dependency check. Since some Beam 
>>> dependent packages are out-of-date, we want to identify them and check for 
>>> dependency updates regularly in the future. Generally, we have couple 
>>> options to do it:
>>> 1. Implementing a Jenkins job that check dependency versions and create 
>>> reports.
>>> 2. Using the Github App Dependabot to automate dependency updates.
>>> 3. Combination of those two solutions.
>>>
>>> I am looking forward to hearing feedback from you :)
>>>
>>> https://docs.google.com/document/d/1rqr_8a9NYZCgeiXpTIwWLCL7X8amPAVfRXsO72BpBwA/
>>>
>>> Thanks.
>>>
>>> Best.
>>> Yifan Zou


Re: [SQL] Unsupported features

2018-06-04 Thread Ismaël Mejía
This is super interesting, great work Kai!

Just for curiosity, How are you validating this?
It would be really interesting to have this also as part of some kind of IT
for the future.


On Fri, Jun 1, 2018 at 7:43 PM Kai Jiang  wrote:

> Sounds a good idea! I will file the major problems later and use a task
> issue to track.
>
> Best,
> Kai
> ᐧ
>
> On Fri, Jun 1, 2018 at 10:10 AM Anton Kedin  wrote:
>
>> This looks very helpful, thank you.
>>
>> Can you file Jiras for the major problems? Or maybe a single jira for the
>> whole thing with sub-tasks for specific problems.
>>
>> Regards,
>> Anton
>>
>> On Wed, May 30, 2018 at 9:12 AM Kenneth Knowles  wrote:
>>
>>> This is extremely useful. Thanks for putting so much information
>>> together!
>>>
>>> Kenn
>>>
>>> On Wed, May 30, 2018 at 8:19 AM Kai Jiang  wrote:
>>>
 Hi all,

 Based on pull/5481 , I
 manually did a coverage test with TPC-ds queries (65%) and TPC-h queries
 (100%) and want to see what features Beam SQL is currently not supporting.
 Test was running on DirectRunner.

 I want to share the result.​
  TPC-DS queries on Beam
 
 ​
 TL;DR:

1. aggregation function (stddev) missing or calculation of
aggregation functions combination.
2. nested beamjoinrel(condition=[true], joinType=[inner]) / cross
join error
3. date type casting/ calculation and other types casting.
4. LIKE operator in String / alias for substring function
5. order by w/o limit clause.
6. OR operator is supported in join condition
7. Syntax: exist/ not exist (errors) .rank() over (partition
by) / view (unsupported)


 Best,
 Kai
 ᐧ

>>>


Re: [VOTE] Code Review Process

2018-06-04 Thread Jean-Baptiste Onofré
+1

I think it's already pretty close to what we do, so, no brainer ;)

Regards
JB

On 01/06/2018 19:25, Thomas Groh wrote:
> As we seem to largely have consensus in "Reducing Committer Load for
> Code Reviews"[1], this is a vote to change the Beam policy on Code
> Reviews to require that
> 
> (1) At least one committer is involved with the code review, as either a
> reviewer or as the author
> (2) A contributor has approved the change
> 
> prior to merging any change.
> 
> This changes our policy from its current requirement that at least one
> committer *who is not the author* has approved the change prior to
> merging. We believe that changing this process will improve code review
> throughput, reduce committer load, and engage more of the community in
> the code review process.
> 
> Please vote:
> [ ] +1: Accept the above proposal to change the Beam code review/merge
> policy
> [ ] -1: Leave the Code Review policy unchanged
> 
> Thanks,
> 
> Thomas
> 
> [1] 
> https://lists.apache.org/thread.html/7c1fde3884fbefacc252b6d4b434f9a9c2cf024f381654aa3e47df18@%3Cdev.beam.apache.org%3E

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [VOTE] Use probot/stale to automatically manage stale pull requests

2018-06-04 Thread Jean-Baptiste Onofré
+1

Regards
JB

On 01/06/2018 18:21, Kenneth Knowles wrote:
> Hi all,
> 
> Following the discussion, please vote on the move to activate
> probot/stale [3] to notify authors of stale PRs per current policy and
> then close them after a 7 day grace period.
> 
> For more details, see:
> 
>  - our stale PR policy [1]
>  - the discussion thread [2]
>  - Probot stale [3]
>  - BEAM ticket summarizing discussion [4]
>  - INFRA ticket to activate probot/stale [5]
>  - Example PR that would activate it [6]
> 
> Please vote:
> [ ] +1, Approve that we activate probot/stale
> [ ] -1, Do not approve (please provide specific comments)
> 
> Kenn
> 
> [1] https://beam.apache.org/contribute/#stale-pull-requests
> [2]
> https://lists.apache.org/thread.html/bda552ea7073ca165aaf47034610afafe22d589e386525023d33609e@%3Cdev.beam.apache.org%3E
> [3] https://github.com/probot/stale
> [4] https://issues.apache.org/jira/browse/BEAM-4423
> [5] https://issues.apache.org/jira/browse/INFRA-16589
> [6] https://github.com/apache/beam/pull/5532

-- 
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: [VOTE] Code Review Process

2018-06-04 Thread Reuven Lax
+1

On Mon, Jun 4, 2018 at 11:40 AM Łukasz Gajowy 
wrote:

> +1
>
> 2018-06-04 9:12 GMT+02:00 Etienne Chauchot :
>
>> +1
>> As I was already applying this.
>>
>> Le samedi 02 juin 2018 à 11:24 +0300, Reuven Lax a écrit :
>>
>> +1
>>
>> I believe only some committers were aware of the old policy, and others
>> were effectively doing this anyway.
>>
>> On Sat, Jun 2, 2018 at 2:51 AM Scott Wegner  wrote:
>>
>> +1
>>
>> On Fri, Jun 1, 2018 at 3:44 PM Pablo Estrada  wrote:
>>
>> +1 :) glad that we had this discussion
>>
>> On Fri, Jun 1, 2018, 3:38 PM Udi Meiri  wrote:
>>
>> +1
>>
>> On Fri, Jun 1, 2018 at 1:46 PM Andrew Pilloud 
>> wrote:
>>
>> +1 - I hope this doesn't reduce the urgency to fix the root cause: not
>> having enough committers.
>>
>> On Fri, Jun 1, 2018 at 1:18 PM Henning Rohde  wrote:
>>
>> +1
>>
>> On Fri, Jun 1, 2018 at 12:27 PM Dan Halperin  wrote:
>>
>> +1 -- this is encoding what I previously thought the process was and
>> what, in practice, I think was often the behavior of committers anyway.
>>
>> On Fri, Jun 1, 2018 at 12:21 PM, Yifan Zou  wrote:
>>
>> +1
>>
>> On Fri, Jun 1, 2018 at 12:10 PM Robert Bradshaw 
>> wrote:
>>
>> +1
>>
>> On Fri, Jun 1, 2018 at 12:06 PM Chamikara Jayalath 
>> wrote:
>>
>> +1
>>
>> Thanks,
>> Cham
>>
>> On Fri, Jun 1, 2018 at 11:36 AM Jason Kuster 
>> wrote:
>>
>> +1
>>
>> On Fri, Jun 1, 2018 at 11:36 AM Ankur Goenka  wrote:
>>
>> +1
>>
>> On Fri, Jun 1, 2018 at 11:28 AM Charles Chen  wrote:
>>
>> +1
>>
>> On Fri, Jun 1, 2018 at 11:20 AM Valentyn Tymofieiev 
>> wrote:
>>
>> +1
>>
>> On Fri, Jun 1, 2018 at 10:40 AM, Ahmet Altay  wrote:
>>
>> +1
>>
>> On Fri, Jun 1, 2018 at 10:37 AM, Kenneth Knowles  wrote:
>>
>> +1
>>
>> On Fri, Jun 1, 2018 at 10:25 AM Thomas Groh  wrote:
>>
>> As we seem to largely have consensus in "Reducing Committer Load for Code
>> Reviews"[1], this is a vote to change the Beam policy on Code Reviews to
>> require that
>>
>> (1) At least one committer is involved with the code review, as either a
>> reviewer or as the author
>> (2) A contributor has approved the change
>>
>> prior to merging any change.
>>
>> This changes our policy from its current requirement that at least one
>> committer *who is not the author* has approved the change prior to merging.
>> We believe that changing this process will improve code review throughput,
>> reduce committer load, and engage more of the community in the code review
>> process.
>>
>> Please vote:
>> [ ] +1: Accept the above proposal to change the Beam code review/merge
>> policy
>> [ ] -1: Leave the Code Review policy unchanged
>>
>> Thanks,
>>
>> Thomas
>>
>> [1]
>> https://lists.apache.org/thread.html/7c1fde3884fbefacc252b6d4b434f9a9c2cf024f381654aa3e47df18@%3Cdev.beam.apache.org%3E
>>
>>
>>
>>
>>
>>
>


Re: [VOTE] Code Review Process

2018-06-04 Thread Łukasz Gajowy
+1

2018-06-04 9:12 GMT+02:00 Etienne Chauchot :

> +1
> As I was already applying this.
>
> Le samedi 02 juin 2018 à 11:24 +0300, Reuven Lax a écrit :
>
> +1
>
> I believe only some committers were aware of the old policy, and others
> were effectively doing this anyway.
>
> On Sat, Jun 2, 2018 at 2:51 AM Scott Wegner  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 3:44 PM Pablo Estrada  wrote:
>
> +1 :) glad that we had this discussion
>
> On Fri, Jun 1, 2018, 3:38 PM Udi Meiri  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 1:46 PM Andrew Pilloud  wrote:
>
> +1 - I hope this doesn't reduce the urgency to fix the root cause: not
> having enough committers.
>
> On Fri, Jun 1, 2018 at 1:18 PM Henning Rohde  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 12:27 PM Dan Halperin  wrote:
>
> +1 -- this is encoding what I previously thought the process was and what,
> in practice, I think was often the behavior of committers anyway.
>
> On Fri, Jun 1, 2018 at 12:21 PM, Yifan Zou  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 12:10 PM Robert Bradshaw 
> wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 12:06 PM Chamikara Jayalath 
> wrote:
>
> +1
>
> Thanks,
> Cham
>
> On Fri, Jun 1, 2018 at 11:36 AM Jason Kuster 
> wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 11:36 AM Ankur Goenka  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 11:28 AM Charles Chen  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 11:20 AM Valentyn Tymofieiev 
> wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 10:40 AM, Ahmet Altay  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 10:37 AM, Kenneth Knowles  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 10:25 AM Thomas Groh  wrote:
>
> As we seem to largely have consensus in "Reducing Committer Load for Code
> Reviews"[1], this is a vote to change the Beam policy on Code Reviews to
> require that
>
> (1) At least one committer is involved with the code review, as either a
> reviewer or as the author
> (2) A contributor has approved the change
>
> prior to merging any change.
>
> This changes our policy from its current requirement that at least one
> committer *who is not the author* has approved the change prior to merging.
> We believe that changing this process will improve code review throughput,
> reduce committer load, and engage more of the community in the code review
> process.
>
> Please vote:
> [ ] +1: Accept the above proposal to change the Beam code review/merge
> policy
> [ ] -1: Leave the Code Review policy unchanged
>
> Thanks,
>
> Thomas
>
> [1] https://lists.apache.org/thread.html/7c1fde3884fbefacc252b6d4b434f9
> a9c2cf024f381654aa3e47df18@%3Cdev.beam.apache.org%3E
>
>
>
>
>
>


Re: [VOTE] Use probot/stale to automatically manage stale pull requests

2018-06-04 Thread Alexey Romanenko
+1

> On 4 Jun 2018, at 10:03, Reuven Lax  wrote:
> 
> +1
> 
> On Mon, Jun 4, 2018, 10:11 AM Etienne Chauchot  > wrote:
> +1
> Etienne
> Le vendredi 01 juin 2018 à 17:58 -0700, Udi Meiri a écrit :
>> +1
>> 
>> On Fri, Jun 1, 2018 at 4:27 PM Lukasz Cwik > > wrote:
>>> +1
>>> 
>>> On Fri, Jun 1, 2018 at 2:53 PM Thomas Weise >> > wrote:
 +1
 
 On Fri, Jun 1, 2018 at 2:17 PM, Robert Bradshaw >>> > wrote:
> +1
> 
> On Fri, Jun 1, 2018 at 1:43 PM Andrew Pilloud  > wrote:
>> +1
>> 
>> On Fri, Jun 1, 2018 at 1:31 PM Huygaa Batsaikhan > > wrote:
>>> +1
>>> 
>>> On Fri, Jun 1, 2018 at 1:17 PM Henning Rohde >> > wrote:
 +1
 
 On Fri, Jun 1, 2018 at 10:16 AM Chamikara Jayalath 
 mailto:chamik...@google.com>> wrote:
> +1 (non-binding).
> 
> Thanks,
> Cham
> 
> On Fri, Jun 1, 2018 at 10:05 AM Kenneth Knowles  > wrote:
>> +1
>> 
>> On Fri, Jun 1, 2018 at 9:54 AM Scott Wegner > > wrote:
>>> +1 (non-binding)
>>> 
>>> On Fri, Jun 1, 2018 at 9:39 AM Ahmet Altay >> > wrote:
 +1
 
 On Fri, Jun 1, 2018, 9:32 AM Jason Kuster >>> > wrote:
> +1 (non-binding): automating policy ensures it is applied fairly 
> and evenly and lessens the load on project maintainers; hearty 
> agreement.
> 
> On Fri, Jun 1, 2018 at 9:25 AM Alan Myrvold  > wrote:
>> +1 (non-binding) I updated the pull request to be 60 days 
>> (instead of 90) to match the contribute policy.
>> 
>> On Fri, Jun 1, 2018 at 9:21 AM Kenneth Knowles > > wrote:
>>> Hi all,
>>> 
>>> Following the discussion, please vote on the move to activate 
>>> probot/stale [3] to notify authors of stale PRs per current 
>>> policy and then close them after a 7 day grace period.
>>> 
>>> For more details, see:
>>> 
>>>  - our stale PR policy [1]
>>>  - the discussion thread [2]
>>>  - Probot stale [3]
>>>  - BEAM ticket summarizing discussion [4]
>>>  - INFRA ticket to activate probot/stale [5]
>>>  - Example PR that would activate it [6]
>>> 
>>> Please vote:
>>> [ ] +1, Approve that we activate probot/stale
>>> [ ] -1, Do not approve (please provide specific comments)
>>> 
>>> Kenn
>>> 
>>> [1] https://beam.apache.org/contribute/#stale-pull-requests 
>>> 
>>> [2] 
>>> https://lists.apache.org/thread.html/bda552ea7073ca165aaf47034610afafe22d589e386525023d33609e@%3Cdev.beam.apache.org%3E
>>>  
>>> 
>>> [3] https://github.com/probot/stale 
>>> [4] 
>>> https://issues.apache.org/jira/browse/BEAM-4423 
>>> 
>>> [5] https://issues.apache.org/jira/browse/INFRA-16589 
>>> 
>>> [6] https://github.com/apache/beam/pull/5532 
>>> 
> 



Re: [VOTE] Use probot/stale to automatically manage stale pull requests

2018-06-04 Thread Reuven Lax
+1

On Mon, Jun 4, 2018, 10:11 AM Etienne Chauchot  wrote:

> +1
> Etienne
> Le vendredi 01 juin 2018 à 17:58 -0700, Udi Meiri a écrit :
>
> +1
>
> On Fri, Jun 1, 2018 at 4:27 PM Lukasz Cwik  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 2:53 PM Thomas Weise  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 2:17 PM, Robert Bradshaw 
> wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 1:43 PM Andrew Pilloud  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 1:31 PM Huygaa Batsaikhan 
> wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 1:17 PM Henning Rohde  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 10:16 AM Chamikara Jayalath 
> wrote:
>
> +1 (non-binding).
>
> Thanks,
> Cham
>
> On Fri, Jun 1, 2018 at 10:05 AM Kenneth Knowles  wrote:
>
> +1
>
> On Fri, Jun 1, 2018 at 9:54 AM Scott Wegner  wrote:
>
> +1 (non-binding)
>
> On Fri, Jun 1, 2018 at 9:39 AM Ahmet Altay  wrote:
>
> +1
>
> On Fri, Jun 1, 2018, 9:32 AM Jason Kuster  wrote:
>
> +1 (non-binding): automating policy ensures it is applied fairly and
> evenly and lessens the load on project maintainers; hearty agreement.
>
> On Fri, Jun 1, 2018 at 9:25 AM Alan Myrvold  wrote:
>
> +1 (non-binding) I updated the pull request to be 60 days (instead of 90)
> to match the contribute policy.
>
> On Fri, Jun 1, 2018 at 9:21 AM Kenneth Knowles  wrote:
>
> Hi all,
>
> Following the discussion, please vote on the move to activate
> probot/stale [3] to notify authors of stale PRs per current policy and
> then close them after a 7 day grace period.
>
> For more details, see:
>
>  - our stale PR policy [1]
>  - the discussion thread [2]
>  - Probot stale [3]
>  - BEAM ticket summarizing discussion [4]
>  - INFRA ticket to activate probot/stale [5]
>  - Example PR that would activate it [6]
>
> Please vote:
> [ ] +1, Approve that we activate probot/stale
> [ ] -1, Do not approve (please provide specific comments)
>
> Kenn
>
> [1] https://beam.apache.org/contribute/#stale-pull-requests
> [2]
> https://lists.apache.org/thread.html/bda552ea7073ca165aaf47034610afafe22d589e386525023d33609e@%3Cdev.beam.apache.org%3E
> [3] https://github.com/probot/stale
> [4] https://issues.apache.org/jira/browse/BEAM-4423
> [5] https://issues.apache.org/jira/browse/INFRA-16589
> [6] https://github.com/apache/beam/pull/5532
>
>
>
>


Re: [VOTE] Code Review Process

2018-06-04 Thread Etienne Chauchot
+1As I was already applying this.
Le samedi 02 juin 2018 à 11:24 +0300, Reuven Lax a écrit :
> +1
> 
> I believe only some committers were aware of the old policy, and others were 
> effectively doing this anyway.
> 
> On Sat, Jun 2, 2018 at 2:51 AM Scott Wegner  wrote:
> > +1
> > 
> > On Fri, Jun 1, 2018 at 3:44 PM Pablo Estrada  wrote:
> > > +1 :) glad that we had this discussion
> > > 
> > > On Fri, Jun 1, 2018, 3:38 PM Udi Meiri  wrote:
> > > > +1
> > > > 
> > > > On Fri, Jun 1, 2018 at 1:46 PM Andrew Pilloud  
> > > > wrote:
> > > > > +1 - I hope this doesn't reduce the urgency to fix the root cause: 
> > > > > not having enough committers.
> > > > > 
> > > > > On Fri, Jun 1, 2018 at 1:18 PM Henning Rohde  
> > > > > wrote:
> > > > > > +1
> > > > > > 
> > > > > > On Fri, Jun 1, 2018 at 12:27 PM Dan Halperin  
> > > > > > wrote:
> > > > > > > +1 -- this is encoding what I previously thought the process was 
> > > > > > > and what, in practice, I think was often
> > > > > > > the behavior of committers anyway.
> > > > > > > On Fri, Jun 1, 2018 at 12:21 PM, Yifan Zou  
> > > > > > > wrote:
> > > > > > > > +1
> > > > > > > > 
> > > > > > > > On Fri, Jun 1, 2018 at 12:10 PM Robert Bradshaw 
> > > > > > > >  wrote:
> > > > > > > > > +1
> > > > > > > > > 
> > > > > > > > > On Fri, Jun 1, 2018 at 12:06 PM Chamikara Jayalath 
> > > > > > > > >  wrote:
> > > > > > > > > > +1
> > > > > > > > > > 
> > > > > > > > > > Thanks,
> > > > > > > > > > Cham
> > > > > > > > > > 
> > > > > > > > > > On Fri, Jun 1, 2018 at 11:36 AM Jason Kuster 
> > > > > > > > > >  wrote:
> > > > > > > > > > > +1
> > > > > > > > > > > 
> > > > > > > > > > > On Fri, Jun 1, 2018 at 11:36 AM Ankur Goenka 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > +1
> > > > > > > > > > > > 
> > > > > > > > > > > > On Fri, Jun 1, 2018 at 11:28 AM Charles Chen 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > +1
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Fri, Jun 1, 2018 at 11:20 AM Valentyn Tymofieiev 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > +1
> > > > > > > > > > > > > > On Fri, Jun 1, 2018 at 10:40 AM, Ahmet Altay 
> > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > +1
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > On Fri, Jun 1, 2018 at 10:37 AM, Kenneth Knowles 
> > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > +1
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > On Fri, Jun 1, 2018 at 10:25 AM Thomas Groh 
> > > > > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > > > > As we seem to largely have consensus in 
> > > > > > > > > > > > > > > > > "Reducing Committer Load for Code Reviews"[1],
> > > > > > > > > > > > > > > > > this is a vote to change the Beam policy on 
> > > > > > > > > > > > > > > > > Code Reviews to require that
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > (1) At least one committer is involved with 
> > > > > > > > > > > > > > > > > the code review, as either a reviewer or
> > > > > > > > > > > > > > > > > as the author
> > > > > > > > > > > > > > > > > (2) A contributor has approved the change
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > prior to merging any change.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > This changes our policy from its current 
> > > > > > > > > > > > > > > > > requirement that at least one committer *who
> > > > > > > > > > > > > > > > > is not the author* has approved the change 
> > > > > > > > > > > > > > > > > prior to merging. We believe that changing
> > > > > > > > > > > > > > > > > this process will improve code review 
> > > > > > > > > > > > > > > > > throughput, reduce committer load, and engage
> > > > > > > > > > > > > > > > > more of the community in the code review 
> > > > > > > > > > > > > > > > > process.
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Please vote:
> > > > > > > > > > > > > > > > > [ ] +1: Accept the above proposal to change 
> > > > > > > > > > > > > > > > > the Beam code review/merge policy
> > > > > > > > > > > > > > > > > [ ] -1: Leave the Code Review policy unchanged
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Thomas
> > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > [1] 
> > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/7c1fde3884fbefacc252b6d4b434f9a9c2cf024f38165
> > > > > > > > > > > > > > > > > 4aa3e47df18@%3Cdev.beam.apache.org%3E
> > > > > > > > > > > 
> > > > > > > > > > > 

Re: [VOTE] Use probot/stale to automatically manage stale pull requests

2018-06-04 Thread Etienne Chauchot
+1EtienneLe vendredi 01 juin 2018 à 17:58 -0700, Udi Meiri a écrit :
> +1
> 
> On Fri, Jun 1, 2018 at 4:27 PM Lukasz Cwik  wrote:
> > +1
> > 
> > On Fri, Jun 1, 2018 at 2:53 PM Thomas Weise  wrote:
> > > +1
> > > 
> > > On Fri, Jun 1, 2018 at 2:17 PM, Robert Bradshaw  
> > > wrote:
> > > > +1
> > > > 
> > > > On Fri, Jun 1, 2018 at 1:43 PM Andrew Pilloud  
> > > > wrote:
> > > > > +1
> > > > > 
> > > > > On Fri, Jun 1, 2018 at 1:31 PM Huygaa Batsaikhan  
> > > > > wrote:
> > > > > > +1
> > > > > > 
> > > > > > On Fri, Jun 1, 2018 at 1:17 PM Henning Rohde  
> > > > > > wrote:
> > > > > > > +1
> > > > > > > 
> > > > > > > On Fri, Jun 1, 2018 at 10:16 AM Chamikara Jayalath 
> > > > > > >  wrote:
> > > > > > > > +1 (non-binding).
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > Cham
> > > > > > > > 
> > > > > > > > On Fri, Jun 1, 2018 at 10:05 AM Kenneth Knowles 
> > > > > > > >  wrote:
> > > > > > > > > +1
> > > > > > > > > 
> > > > > > > > > On Fri, Jun 1, 2018 at 9:54 AM Scott Wegner 
> > > > > > > > >  wrote:
> > > > > > > > > > +1 (non-binding)
> > > > > > > > > > 
> > > > > > > > > > On Fri, Jun 1, 2018 at 9:39 AM Ahmet Altay 
> > > > > > > > > >  wrote:
> > > > > > > > > > > +1
> > > > > > > > > > > 
> > > > > > > > > > > On Fri, Jun 1, 2018, 9:32 AM Jason Kuster 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > > +1 (non-binding): automating policy ensures it is 
> > > > > > > > > > > > applied fairly and evenly and lessens the load
> > > > > > > > > > > > on project maintainers; hearty agreement.
> > > > > > > > > > > > 
> > > > > > > > > > > > On Fri, Jun 1, 2018 at 9:25 AM Alan Myrvold 
> > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > +1 (non-binding) I updated the pull request to be 60 
> > > > > > > > > > > > > days (instead of 90) to match the
> > > > > > > > > > > > > contribute policy.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > On Fri, Jun 1, 2018 at 9:21 AM Kenneth Knowles 
> > > > > > > > > > > > >  wrote:
> > > > > > > > > > > > > > Hi all,
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Following the discussion, please vote on the move 
> > > > > > > > > > > > > > to activate probot/stale [3] to notify
> > > > > > > > > > > > > > authors of stale PRs per current policy and then 
> > > > > > > > > > > > > > close them after a 7 day grace period.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > For more details, see:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > >  - our stale PR policy [1]
> > > > > > > > > > > > > >  - the discussion thread [2]
> > > > > > > > > > > > > >  - Probot stale [3]
> > > > > > > > > > > > > >  - BEAM ticket summarizing discussion [4]
> > > > > > > > > > > > > >  - INFRA ticket to activate probot/stale [5]
> > > > > > > > > > > > > >  - Example PR that would activate it [6]
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Please vote:
> > > > > > > > > > > > > > [ ] +1, Approve that we activate probot/stale
> > > > > > > > > > > > > > [ ] -1, Do not approve (please provide specific 
> > > > > > > > > > > > > > comments)
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Kenn
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > [1] 
> > > > > > > > > > > > > > https://beam.apache.org/contribute/#stale-pull-requests
> > > > > > > > > > > > > > [2] 
> > > > > > > > > > > > > > https://lists.apache.org/thread.html/bda552ea7073ca165aaf47034610afafe22d589e386525023d3
> > > > > > > > > > > > > > 3609e@%3Cdev.beam.apache.org%3E
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > [3] https://github.com/probot/stale
> > > > > > > > > > > > > > [4] https://issues.apache.org/jira/browse/BEAM-4423
> > > > > > > > > > > > > > [5] 
> > > > > > > > > > > > > > https://issues.apache.org/jira/browse/INFRA-16589
> > > > > > > > > > > > > > [6] https://github.com/apache/beam/pull/5532
> > > > > > > > > > > > 
> > > > > > > > > > > >