Re: Unable to start BEAM sql shell

2019-07-08 Thread Thomas K.
Hi,

Thanks for the advice. I made a fresh clone from master and also removed
the configuration: shadow line from the build.gradle file. However. I'm
still getting the same error message.

A bit more about my env : I'm running Gradle 5.4.1 and Java 8 on Windows.

Any thoughts?

On Tue, Jul 9, 2019 at 1:44 AM Rui Wang  wrote:

> Indeed it's broken on shadow configuration.  I found that if removing
> configuration: "shadow" from [1], this command will pass.
>
>
> [1]:
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/shell/build.gradle#L31
>
> On Mon, Jul 8, 2019 at 1:10 PM Kyle Weaver  wrote:
>
>> Hi Thomas,
>>
>> You probably need to make sure your clone of the Beam repo is up to date.
>> Also, it looks like there's another bug so I filed a Jira:
>> https://issues.apache.org/jira/browse/BEAM-7708
>>
>> Thanks,
>> Kyle
>>
>> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
>> | +1650203
>>
>>
>> On Mon, Jul 8, 2019 at 11:59 AM Thomas K.  wrote:
>>
>>> Hi,
>>>
>>> I'm following the instructions on this page -
>>>
>>> https://beam.apache.org/documentation/dsls/sql/shell/
>>>
>>> and running the following command
>>>
>>> gradlew -p sdks/java/extensions/sql/shell 
>>> -Pbeam.sql.shell.bundled=':runners:flink:1.5,:sdks:java:io:kafka' 
>>> installDist
>>>
>>>
>>>
>>>
>>>
>>> However, it fails with the error:
>>> A problem occurred evaluating project ':sdks:java:extensions:sql:shell'.
>>> > Project with path '':runners:flink:1.5' could not be found in project
>>> ':sdks:java:extensions:sql:shell'.
>>>
>>>
>>>
>>> How do I get it to  recognize all the extensions so that I can run the
>>> SQL shell?
>>>
>>> Thanks.
>>>
>>>


Re: [DISCUSS] Contributor guidelines for iterating on PRs: when to squash commits.

2019-07-08 Thread Valentyn Tymofieiev
Rui, committer guide[1] does say that all commits are standalone changes:

We prefer small independent, incremental PRs with descriptive, isolated
> commits. Each commit is a single clear change.
>

However in my opinion, this recommendation applies to moments when a PR is
first sent for review, and when a PR is being merged. Committer guide also
mentions that during review iterations authors may add review-related
commits.

the pull request may have a collection of review-related commits that are
> not meaningful to preserve in the history. The reviewer should give the
> LGTM and then request that the author of the pull request rebase, squash,
> split, etc, the commits, so that the history is most useful.


Review-related commits don't have to be isolated independent changes, and
perhaps committer guide and contributor guide [2] should spell out clearly
that authors should not feel pressure to make review commits look like
meaningful changes of their own when it does not make sense to do.  By the
end of the review, review commits should be squashed by a committer or by
the author.

I think there are some incentives to always squash-and-force-push:
- Committer will not ask the author to squash commits if there is only one
commit.
- We don't have to wait for another round of tests to pass on the final  PR.

Both concerns are addressed if a committer follows squash-and-merge
workflow.

[1] https://beam.apache.org/contribute/committer-guide

[2] https://beam.apache.org/contribute/

On Mon, Jul 8, 2019 at 11:33 AM Rui Wang  wrote:

> Myself usually follows the pattern of "authors force-push their changes
> during every review iteration". The reason is after reading [1], I found
> it's hard to maintain a multiple commits PR as it's hard to create isolated
> commits for different logical pieces of code in practice. Therefore in
> practice I keep squash commits (and then have to force-push) to create at
> least a single isolated commit.
>
>
>
> [1]
> https://beam.apache.org/contribute/committer-guide/#pull-request-review-objectives
>
> -Rui
>
> On Mon, Jul 8, 2019 at 11:25 AM Udi Meiri  wrote:
>
>> I think there are already some guidelines here:
>> https://beam.apache.org/contribute/committer-guide/#pull-request-review-objectives
>>  (maybe
>> we could point to them from the PR template?)
>> Yes, it is acceptable to ask for squash or if it's ok to squash to a
>> single commit.
>>
>> On Mon, Jul 8, 2019 at 11:14 AM Valentyn Tymofieiev 
>> wrote:
>>
>>> I have observed a pattern where authors force-push their changes during
>>> every review iteration, so that a pull request always contains one commit.
>>> This creates the following problems:
>>>
>>> 1. It is hard to see what has changed between review iterations.
>>> 2. Sometimes authors  make changes in parts of pull requests that the
>>> reviewer did not comment on, and such changes may be unnoticed by the
>>> reviewer.
>>> 3. After a force-push, comments made by reviewers on earlier commit are
>>> hard to find.
>>>
>>> A better workflow may be to:
>>> 1. Between review iterations authors push changes in new commit(s), but
>>> also keep the original commit.
>>> 2. If a follow-up commit does not constitute a meaningful change of its
>>> own, it should be prefixed with "fixup: ".
>>> 3. Once review has finished either:
>>> - Authors squash fixup commits after all reviewers have approved the PR
>>> per request of a reviewer.
>>> - Committers squash fixup commits during merge.
>>>
>>> I am curious what thoughts or suggestions others have. In particular:
>>> 1. Should we document guidelines for iterating on PRs in our contributor
>>> guide?
>>> 2. Is it acceptable for a reviewer to ask the author to rebase squashed
>>> changes that were force-pushed to address review feedback onto their
>>> original commits to simplify the rest of the review?
>>>
>>> Thanks.
>>>
>>> Related discussion:
>>> [1] Committer Guidelines / Hygene before merging PRs
>>> https://lists.apache.org/thread.html/6d922820d6fc352479f88e5c8737f2c8893ddb706a1e578b50d28948@%3Cdev.beam.apache.org%3E
>>>
>>


Re: [VOTE] Vendored dependencies release process

2019-07-08 Thread Lukasz Cwik
Thanks for taking a look. I followed up on your questions.

On Mon, Jul 8, 2019 at 3:58 PM Udi Meiri  wrote:

> I left some comments. Being new to the Beam releasing process, my question
> might be trivial to someone actually performing the release.
>
> On Tue, Jul 2, 2019 at 4:49 PM Lukasz Cwik  wrote:
>
>> Please vote based on the vendored dependencies release process as
>> discussed[1] and documented[2].
>>
>> Please vote as follows:
>> +1: Adopt the vendored dependency release process
>> -1: The vendored release process needs to change because ...
>>
>> Since many people in the US may be out due to the holiday schedule, I'll
>> try to close the vote and tally the results on July 9th so please vote
>> before then.
>>
>> 1:
>> https://lists.apache.org/thread.html/e2c49a5efaee2ad416b083fbf3b9b6db60fdb04750208bfc34cecaf0@%3Cdev.beam.apache.org%3E
>> 2: https://s.apache.org/beam-release-vendored-artifacts
>>
>


Re: [VOTE] Vendored dependencies release process

2019-07-08 Thread Udi Meiri
I left some comments. Being new to the Beam releasing process, my question
might be trivial to someone actually performing the release.

On Tue, Jul 2, 2019 at 4:49 PM Lukasz Cwik  wrote:

> Please vote based on the vendored dependencies release process as
> discussed[1] and documented[2].
>
> Please vote as follows:
> +1: Adopt the vendored dependency release process
> -1: The vendored release process needs to change because ...
>
> Since many people in the US may be out due to the holiday schedule, I'll
> try to close the vote and tally the results on July 9th so please vote
> before then.
>
> 1:
> https://lists.apache.org/thread.html/e2c49a5efaee2ad416b083fbf3b9b6db60fdb04750208bfc34cecaf0@%3Cdev.beam.apache.org%3E
> 2: https://s.apache.org/beam-release-vendored-artifacts
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Unable to start BEAM sql shell

2019-07-08 Thread Rui Wang
Indeed it's broken on shadow configuration.  I found that if removing
configuration: "shadow" from [1], this command will pass.


[1]:
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/shell/build.gradle#L31

On Mon, Jul 8, 2019 at 1:10 PM Kyle Weaver  wrote:

> Hi Thomas,
>
> You probably need to make sure your clone of the Beam repo is up to date.
> Also, it looks like there's another bug so I filed a Jira:
> https://issues.apache.org/jira/browse/BEAM-7708
>
> Thanks,
> Kyle
>
> Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com
> | +1650203
>
>
> On Mon, Jul 8, 2019 at 11:59 AM Thomas K.  wrote:
>
>> Hi,
>>
>> I'm following the instructions on this page -
>>
>> https://beam.apache.org/documentation/dsls/sql/shell/
>>
>> and running the following command
>>
>> gradlew -p sdks/java/extensions/sql/shell 
>> -Pbeam.sql.shell.bundled=':runners:flink:1.5,:sdks:java:io:kafka' installDist
>>
>>
>>
>>
>>
>> However, it fails with the error:
>> A problem occurred evaluating project ':sdks:java:extensions:sql:shell'.
>> > Project with path '':runners:flink:1.5' could not be found in project
>> ':sdks:java:extensions:sql:shell'.
>>
>>
>>
>> How do I get it to  recognize all the extensions so that I can run the
>> SQL shell?
>>
>> Thanks.
>>
>>


Re: Unable to start BEAM sql shell

2019-07-08 Thread Kyle Weaver
Hi Thomas,

You probably need to make sure your clone of the Beam repo is up to date.
Also, it looks like there's another bug so I filed a Jira:
https://issues.apache.org/jira/browse/BEAM-7708

Thanks,
Kyle

Kyle Weaver | Software Engineer | github.com/ibzib | kcwea...@google.com |
+1650203


On Mon, Jul 8, 2019 at 11:59 AM Thomas K.  wrote:

> Hi,
>
> I'm following the instructions on this page -
>
> https://beam.apache.org/documentation/dsls/sql/shell/
>
> and running the following command
>
> gradlew -p sdks/java/extensions/sql/shell 
> -Pbeam.sql.shell.bundled=':runners:flink:1.5,:sdks:java:io:kafka' installDist
>
>
>
>
>
> However, it fails with the error:
> A problem occurred evaluating project ':sdks:java:extensions:sql:shell'.
> > Project with path '':runners:flink:1.5' could not be found in project
> ':sdks:java:extensions:sql:shell'.
>
>
>
> How do I get it to  recognize all the extensions so that I can run the SQL
> shell?
>
> Thanks.
>
>


Unable to start BEAM sql shell

2019-07-08 Thread Thomas K.
Hi,

I'm following the instructions on this page -

https://beam.apache.org/documentation/dsls/sql/shell/

and running the following command

gradlew -p sdks/java/extensions/sql/shell
-Pbeam.sql.shell.bundled=':runners:flink:1.5,:sdks:java:io:kafka'
installDist





However, it fails with the error:
A problem occurred evaluating project ':sdks:java:extensions:sql:shell'.
> Project with path '':runners:flink:1.5' could not be found in project
':sdks:java:extensions:sql:shell'.



How do I get it to  recognize all the extensions so that I can run the SQL
shell?

Thanks.


Re: [DISCUSS] Contributor guidelines for iterating on PRs: when to squash commits.

2019-07-08 Thread Rui Wang
Myself usually follows the pattern of "authors force-push their changes
during every review iteration". The reason is after reading [1], I found
it's hard to maintain a multiple commits PR as it's hard to create isolated
commits for different logical pieces of code in practice. Therefore in
practice I keep squash commits (and then have to force-push) to create at
least a single isolated commit.



[1]
https://beam.apache.org/contribute/committer-guide/#pull-request-review-objectives

-Rui

On Mon, Jul 8, 2019 at 11:25 AM Udi Meiri  wrote:

> I think there are already some guidelines here:
> https://beam.apache.org/contribute/committer-guide/#pull-request-review-objectives
>  (maybe
> we could point to them from the PR template?)
> Yes, it is acceptable to ask for squash or if it's ok to squash to a
> single commit.
>
> On Mon, Jul 8, 2019 at 11:14 AM Valentyn Tymofieiev 
> wrote:
>
>> I have observed a pattern where authors force-push their changes during
>> every review iteration, so that a pull request always contains one commit.
>> This creates the following problems:
>>
>> 1. It is hard to see what has changed between review iterations.
>> 2. Sometimes authors  make changes in parts of pull requests that the
>> reviewer did not comment on, and such changes may be unnoticed by the
>> reviewer.
>> 3. After a force-push, comments made by reviewers on earlier commit are
>> hard to find.
>>
>> A better workflow may be to:
>> 1. Between review iterations authors push changes in new commit(s), but
>> also keep the original commit.
>> 2. If a follow-up commit does not constitute a meaningful change of its
>> own, it should be prefixed with "fixup: ".
>> 3. Once review has finished either:
>> - Authors squash fixup commits after all reviewers have approved the PR
>> per request of a reviewer.
>> - Committers squash fixup commits during merge.
>>
>> I am curious what thoughts or suggestions others have. In particular:
>> 1. Should we document guidelines for iterating on PRs in our contributor
>> guide?
>> 2. Is it acceptable for a reviewer to ask the author to rebase squashed
>> changes that were force-pushed to address review feedback onto their
>> original commits to simplify the rest of the review?
>>
>> Thanks.
>>
>> Related discussion:
>> [1] Committer Guidelines / Hygene before merging PRs
>> https://lists.apache.org/thread.html/6d922820d6fc352479f88e5c8737f2c8893ddb706a1e578b50d28948@%3Cdev.beam.apache.org%3E
>>
>


Re: [DISCUSS] Contributor guidelines for iterating on PRs: when to squash commits.

2019-07-08 Thread Valentyn Tymofieiev
Thanks Udi, my second question is actually about asking to "unsquash" the
change, when doing so will simplify the review process. For example, think
of a large PR that received several comments, and author addressed them,
however the author squashed all changes into the original commit.

On Mon, Jul 8, 2019 at 11:25 AM Udi Meiri  wrote:

> I think there are already some guidelines here:
> https://beam.apache.org/contribute/committer-guide/#pull-request-review-objectives
>  (maybe
> we could point to them from the PR template?)
> Yes, it is acceptable to ask for squash or if it's ok to squash to a
> single commit.
>
> On Mon, Jul 8, 2019 at 11:14 AM Valentyn Tymofieiev 
> wrote:
>
>> I have observed a pattern where authors force-push their changes during
>> every review iteration, so that a pull request always contains one commit.
>> This creates the following problems:
>>
>> 1. It is hard to see what has changed between review iterations.
>> 2. Sometimes authors  make changes in parts of pull requests that the
>> reviewer did not comment on, and such changes may be unnoticed by the
>> reviewer.
>> 3. After a force-push, comments made by reviewers on earlier commit are
>> hard to find.
>>
>> A better workflow may be to:
>> 1. Between review iterations authors push changes in new commit(s), but
>> also keep the original commit.
>> 2. If a follow-up commit does not constitute a meaningful change of its
>> own, it should be prefixed with "fixup: ".
>> 3. Once review has finished either:
>> - Authors squash fixup commits after all reviewers have approved the PR
>> per request of a reviewer.
>> - Committers squash fixup commits during merge.
>>
>> I am curious what thoughts or suggestions others have. In particular:
>> 1. Should we document guidelines for iterating on PRs in our contributor
>> guide?
>> 2. Is it acceptable for a reviewer to ask the author to rebase squashed
>> changes that were force-pushed to address review feedback onto their
>> original commits to simplify the rest of the review?
>>
>> Thanks.
>>
>> Related discussion:
>> [1] Committer Guidelines / Hygene before merging PRs
>> https://lists.apache.org/thread.html/6d922820d6fc352479f88e5c8737f2c8893ddb706a1e578b50d28948@%3Cdev.beam.apache.org%3E
>>
>


Unable to run/debug test cases with category "NeedsRunner"

2019-07-08 Thread Sehrish Naeem
I have been trying to run this test (which needs runner) locally using this
command.

gradle runners:direct-java:needsRunnerTests --tests
org.apache.beam.sdk.transforms.GroupTest.testGroupByOneField

I am getting this exception, can someone help with this or faced same
issue?

java.lang.RuntimeException: Unable to instantiate test options from
system property
beamTestPipelineOptions:[--runner=DirectRunner,--runnerDeterminedSharding=false]
at 
org.apache.beam.sdk.testing.TestPipeline.testingPipelineOptions(TestPipeline.java:470)
at 
org.apache.beam.sdk.testing.TestPipeline.create(TestPipeline.java:261)
at 
org.apache.beam.sdk.transforms.MapElementsTest.(MapElementsTest.java:61)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:250)
at 
org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:260)
at 
org.junit.runners.BlockJUnit4ClassRunner$2.runReflectiveCall(BlockJUnit4ClassRunner.java:309)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.BlockJUnit4ClassRunner.methodBlock(BlockJUnit4ClassRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:349)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:314)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:312)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at org.junit.runners.ParentRunner.run(ParentRunner.java:396)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
at 
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at 

Re: [DISCUSS] Contributor guidelines for iterating on PRs: when to squash commits.

2019-07-08 Thread Udi Meiri
I think there are already some guidelines here:
https://beam.apache.org/contribute/committer-guide/#pull-request-review-objectives
(maybe
we could point to them from the PR template?)
Yes, it is acceptable to ask for squash or if it's ok to squash to a single
commit.

On Mon, Jul 8, 2019 at 11:14 AM Valentyn Tymofieiev 
wrote:

> I have observed a pattern where authors force-push their changes during
> every review iteration, so that a pull request always contains one commit.
> This creates the following problems:
>
> 1. It is hard to see what has changed between review iterations.
> 2. Sometimes authors  make changes in parts of pull requests that the
> reviewer did not comment on, and such changes may be unnoticed by the
> reviewer.
> 3. After a force-push, comments made by reviewers on earlier commit are
> hard to find.
>
> A better workflow may be to:
> 1. Between review iterations authors push changes in new commit(s), but
> also keep the original commit.
> 2. If a follow-up commit does not constitute a meaningful change of its
> own, it should be prefixed with "fixup: ".
> 3. Once review has finished either:
> - Authors squash fixup commits after all reviewers have approved the PR
> per request of a reviewer.
> - Committers squash fixup commits during merge.
>
> I am curious what thoughts or suggestions others have. In particular:
> 1. Should we document guidelines for iterating on PRs in our contributor
> guide?
> 2. Is it acceptable for a reviewer to ask the author to rebase squashed
> changes that were force-pushed to address review feedback onto their
> original commits to simplify the rest of the review?
>
> Thanks.
>
> Related discussion:
> [1] Committer Guidelines / Hygene before merging PRs
> https://lists.apache.org/thread.html/6d922820d6fc352479f88e5c8737f2c8893ddb706a1e578b50d28948@%3Cdev.beam.apache.org%3E
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: PR#6675 Updates

2019-07-08 Thread Rui Wang
Thanks for the PR! I left some comments related to schema types.


-Rui

On Sat, Jul 6, 2019 at 7:54 AM rahul patwari 
wrote:

>
> On Fri 5 Jul, 2019, 9:25 PM Ismaël Mejía,  wrote:
>
>>  This is a holiday week in the US and a good chunk of the people in
>> the project have been busy between Beam summit and other events in the
>> last days, this is why reviews are taking longer than expected. Sorry,
>> next week most things will be back to normal (hopefully).
>>
>> On Fri, Jul 5, 2019 at 10:27 AM Sehrish Naeem
>>  wrote:
>> >
>> > Hi,
>> >
>> > My name is Sehrish Naeem, can someone please review the PR of BEAM-6675?
>> >
>> > Thank you
>>
>


[DISCUSS] Contributor guidelines for iterating on PRs: when to squash commits.

2019-07-08 Thread Valentyn Tymofieiev
I have observed a pattern where authors force-push their changes during
every review iteration, so that a pull request always contains one commit.
This creates the following problems:

1. It is hard to see what has changed between review iterations.
2. Sometimes authors  make changes in parts of pull requests that the
reviewer did not comment on, and such changes may be unnoticed by the
reviewer.
3. After a force-push, comments made by reviewers on earlier commit are
hard to find.

A better workflow may be to:
1. Between review iterations authors push changes in new commit(s), but
also keep the original commit.
2. If a follow-up commit does not constitute a meaningful change of its
own, it should be prefixed with "fixup: ".
3. Once review has finished either:
- Authors squash fixup commits after all reviewers have approved the PR per
request of a reviewer.
- Committers squash fixup commits during merge.

I am curious what thoughts or suggestions others have. In particular:
1. Should we document guidelines for iterating on PRs in our contributor
guide?
2. Is it acceptable for a reviewer to ask the author to rebase squashed
changes that were force-pushed to address review feedback onto their
original commits to simplify the rest of the review?

Thanks.

Related discussion:
[1] Committer Guidelines / Hygene before merging PRs
https://lists.apache.org/thread.html/6d922820d6fc352479f88e5c8737f2c8893ddb706a1e578b50d28948@%3Cdev.beam.apache.org%3E


Re: Stop using Perfkit Benchmarker tool in all tests?

2019-07-08 Thread Udi Meiri
The Python 3 incompatibility is reason enough to move off of Perfkit. (+1)

On Mon, Jul 8, 2019 at 9:49 AM Mark Liu  wrote:

> Thanks for summarizing this discussion and post in dev list. I was closely
> working on Python performance tests and those Perfkit problems are really
> painful. So +1 to remove Perfkit and also remove those tests that are no
> longer maintained.
>
> For #2 (Python performance tests), there are no special setup for them.
> The only missing part I can see is metrics collection and data upload to a
> shared storage (e.g. BigQuery), which is provided free in Perfkit
> framework. This seems common to all language, so wondering if a shared
> infra is possible.
>
> Mark
>
> On Wed, Jul 3, 2019 at 9:36 AM Lukasz Cwik  wrote:
>
>> Makes sense to me to move forward with your suggestion.
>>
>> On Wed, Jul 3, 2019 at 3:57 AM Łukasz Gajowy 
>> wrote:
>>
>>> Are there features in Perfkit that we would like to be using that we
 aren't?

>>>
>>> Besides the Kubernetes related code I mentioned above (that, I believe,
>>> can be easily replaced) I don't see any added value in having Perfkit. The
>>> Kubernetes parts could be replaced with a set of fine-grained Gradle tasks
>>> invoked by other high-level tasks and Jenkins job's steps. There also seem
>>> to be some Gradle + Kubernetes plugins out there that might prove useful
>>> here (no solid research in that area).
>>>
>>>
 Can we make the integration with Perfkit less brittle?

>>>
>>> There was an idea to move all beam benchmark's code from Perfkit (
>>> beam_benchmark_helper.py
>>> 
>>> , beam_integration_benchmark.py
>>> )
>>> to beam repository and inject it to Perfkit every time we use it. However,
>>> that would require investing time and effort in doing that and it will
>>> still not solve the problems I listed above. It will also still require
>>> knowledge of how Perfkit works from Beam developers while we can avoid that
>>> and use the existing tools (gradle, jenkins).
>>>
>>> Thanks!
>>>
>>> pt., 28 cze 2019 o 17:31 Lukasz Cwik  napisał(a):
>>>
 +1 for removing tests that are not maintained.

 Are there features in Perfkit that we would like to be using that we
 aren't?
 Can we make the integration with Perfkit less brittle?

 If we aren't getting much and don't plan to get much value in the short
 term, removal makes sense to me.

 On Thu, Jun 27, 2019 at 3:16 AM Łukasz Gajowy 
 wrote:

> Hi all,
>
> moving the discussion to the dev list:
> https://github.com/apache/beam/pull/8919. I think that Perfkit
> Benchmarker should be removed from all our tests.
>
> Problems that we face currently:
>
>1. Changes to Gradle tasks/build configuration in the Beam
>codebase have to be reflected in Perfkit code. This required PRs to 
> Perfkit
>which can last and the tests break due to this sometimes (no change in
>Perfkit + change already there in beam = incompatibility). This is what
>happened in PR 8919 (above),
>2. Can't run in Python3 (depends on python 2 only library like
>functools32),
>3. Black box testing which hard to collect pipeline related
>metrics,
>4. Measurement of run time is inaccurate,
>5. It offers relatively small elasticity in comparison with eg.
>Jenkins tasks in terms of setting up the testing infrastructure 
> (runners,
>databases). For example, if we'd like to setup Flink runner, and reuse 
> it
>in consequent tests in one go, that would be impossible. We can easily 
> do
>this in Jenkins.
>
> Tests that use Perfkit:
>
>1.  IO integration tests,
>2.  Python performance tests,
>3.  beam_PerformanceTests_Dataflow (disabled),
>4.  beam_PerformanceTests_Spark (failing constantly - looks not
>maintained).
>
> From the IOIT perspective (1), only the code that setups/tears down
> Kubernetes resources is useful right now but these parts can be easily
> implemented in Jenkins/Gradle code. That would make Perfkit obsolete in
> IOIT because we already collect metrics using Metrics API and store them 
> in
> BigQuery directly.
>
> As for point 2: I have no knowledge of how complex the task would be
> (help needed).
>
> Regarding 3, 4: Those tests seem to be not maintained - should we
> remove them?
>
> Opinions?
>
> Thank you,
> Łukasz
>
>
>
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [VOTE] Vendored dependencies release process

2019-07-08 Thread Kenneth Knowles
+1

On Mon, Jul 8, 2019 at 2:07 AM Łukasz Gajowy  wrote:

> +1
>
> Thanks for documenting the process clearly.
>
> Łukasz
>
> sob., 6 lip 2019 o 20:32 David Morávek 
> napisał(a):
>
>> +1
>>
>> Sent from my iPhone
>>
>> On 6 Jul 2019, at 11:25, Lukasz Cwik  wrote:
>>
>> +1
>>
>> On Wed, Jul 3, 2019 at 10:24 AM Jens Nyman 
>> wrote:
>>
>>> +1
>>>
>>> On 2019/07/02 23:49:10, Lukasz Cwik  wrote:
>>> > Please vote based on the vendored dependencies release process as>
>>> > discussed[1] and documented[2].>
>>> >
>>> > Please vote as follows:>
>>> > +1: Adopt the vendored dependency release process>
>>> > -1: The vendored release process needs to change because ...>
>>> >
>>> > Since many people in the US may be out due to the holiday schedule,
>>> I'll>
>>> > try to close the vote and tally the results on July 9th so please
>>> vote>
>>> > before then.>
>>> >
>>> > 1:>
>>> >
>>> https://lists.apache.org/thread.html/e2c49a5efaee2ad416b083fbf3b9b6db60fdb04750208bfc34cecaf0@%3Cdev.beam.apache.org%3E>
>>>
>>> > 2: https://s.apache.org/beam-release-vendored-artifacts>
>>> >
>>>
>>


Re: Python Utilities

2019-07-08 Thread Shannon Duncan
Yeah these are for local testing right now. I was hoping to gain insight on
better naming.

I was thinking of creating an "extras" module.

On Mon, Jul 8, 2019, 12:28 PM Robin Qiu  wrote:

> Hi Shannon,
>
> Thanks for sharing the repo! I took a quick look and I have a concern with
> the naming of the transforms.
>
> Currently, Beam Java already have "Select" and "Join" transforms. However,
> they work on schemas, a feature that is not yet implemented in Beam Python.
> (See
> https://github.com/apache/beam/tree/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
> )
>
> To maintain consistency between SDKs, I think it is good to avoid having
> two different transforms with the same name but different functions. So
> maybe you can consider renaming the transforms or/and putting it in an
> extension Python module, instead of the main ones?
>
> Best,
> Robin
>
> On Mon, Jul 8, 2019 at 9:19 AM Shannon Duncan 
> wrote:
>
>> As a follow up. Here is the repo that contains the utilities for now.
>> https://github.com/shadowcodex/apache-beam-utilities. Will put together
>> a proper PR as code gets closer to production quality.
>>
>> - Shannon
>>
>> On Mon, Jul 8, 2019 at 9:20 AM Shannon Duncan 
>> wrote:
>>
>>> Thanks Frederik,
>>>
>>> That's exactly where I was looking. I did get permission to open source
>>> the utilities module. So I'm going to throw them up on my personal github
>>> soon and share with the email group for a look over.
>>>
>>> I'm going to work on the utilities there because it's a quick dev
>>> environment and then once they are ready for proper PR I'll begin working
>>> them into the actual SDK for a PR.
>>>
>>> I also joined the slack #beam and #beam-python channels, I was unsure of
>>> where most collaborators discussed items.
>>>
>>> - Shannon
>>>
>>> On Mon, Jul 8, 2019 at 9:09 AM Frederik Bode 
>>> wrote:
>>>
 Hi Shannon,

 This is probably a good starting point:
 https://github.com/apache/beam/blob/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5/sdks/python/apache_beam/transforms/combiners.py#L68
 .

 Frederik

 [image: https://ml6.eu]
 


 * Frederik Bode*

 ML6 Ghent
 
 +32 4 92 78 96 18


  DISCLAIMER 

 This email and any files transmitted with it are confidential and
 intended solely for the use of the individual or entity to whom they are
 addressed. If you have received this email in error please notify the
 system manager. This message contains confidential information and is
 intended only for the individual named. If you are not the named addressee
 you should not disseminate, distribute or copy this e-mail. Please notify
 the sender immediately by e-mail if you have received this e-mail by
 mistake and delete this e-mail from your system. If you are not the
 intended recipient you are notified that disclosing, copying, distributing
 or taking any action in reliance on the contents of this information is
 strictly prohibited.


 On Mon, 8 Jul 2019 at 15:40, Shannon Duncan 
 wrote:

> I'm sure I could use some of the existing aggregations as a guide on
> how to make aggregations to fill the gap of missing ones. Such as creating
> Sum/Max/Min.
>
> GroupBy is really already handled with GroupByKey and CoGroupByKey
> unless you are thinking of a different type of GroupBy?
>
> - Shannon
>
> On Sun, Jul 7, 2019 at 10:47 PM Rui Wang  wrote:
>
>> Maybe also adding Aggregation/GroupBy as utilities?
>>
>>
>> -Rui
>>
>> On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> Thanks Valentyn,
>>>
>>> I'll outline the utilities and accept any suggestions to add /
>>> modify. These are really just shortcut PTransforms that I am working on 
>>> to
>>> simplify creating pipelines.
>>>
>>> Currently the utilities contain the following PTransforms:
>>>
>>> - Inner Join
>>> - Left Outer Join
>>> - Right Outer Join
>>> - Full Outer Join
>>> - PrepareKey (For selecting items in a dictionary to act as a key
>>> for the joins)
>>> - Select (very simple filter that returns 

Re: Python Utilities

2019-07-08 Thread Robin Qiu
Hi Shannon,

Thanks for sharing the repo! I took a quick look and I have a concern with
the naming of the transforms.

Currently, Beam Java already have "Select" and "Join" transforms. However,
they work on schemas, a feature that is not yet implemented in Beam Python.
(See
https://github.com/apache/beam/tree/77b295b1c2b0a206099b8f50c4d3180c248e252c/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms
)

To maintain consistency between SDKs, I think it is good to avoid having
two different transforms with the same name but different functions. So
maybe you can consider renaming the transforms or/and putting it in an
extension Python module, instead of the main ones?

Best,
Robin

On Mon, Jul 8, 2019 at 9:19 AM Shannon Duncan 
wrote:

> As a follow up. Here is the repo that contains the utilities for now.
> https://github.com/shadowcodex/apache-beam-utilities. Will put together a
> proper PR as code gets closer to production quality.
>
> - Shannon
>
> On Mon, Jul 8, 2019 at 9:20 AM Shannon Duncan 
> wrote:
>
>> Thanks Frederik,
>>
>> That's exactly where I was looking. I did get permission to open source
>> the utilities module. So I'm going to throw them up on my personal github
>> soon and share with the email group for a look over.
>>
>> I'm going to work on the utilities there because it's a quick dev
>> environment and then once they are ready for proper PR I'll begin working
>> them into the actual SDK for a PR.
>>
>> I also joined the slack #beam and #beam-python channels, I was unsure of
>> where most collaborators discussed items.
>>
>> - Shannon
>>
>> On Mon, Jul 8, 2019 at 9:09 AM Frederik Bode 
>> wrote:
>>
>>> Hi Shannon,
>>>
>>> This is probably a good starting point:
>>> https://github.com/apache/beam/blob/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5/sdks/python/apache_beam/transforms/combiners.py#L68
>>> .
>>>
>>> Frederik
>>>
>>> [image: https://ml6.eu]
>>> 
>>>
>>>
>>> * Frederik Bode*
>>>
>>> ML6 Ghent
>>> 
>>> +32 4 92 78 96 18
>>>
>>>
>>>  DISCLAIMER 
>>>
>>> This email and any files transmitted with it are confidential and
>>> intended solely for the use of the individual or entity to whom they are
>>> addressed. If you have received this email in error please notify the
>>> system manager. This message contains confidential information and is
>>> intended only for the individual named. If you are not the named addressee
>>> you should not disseminate, distribute or copy this e-mail. Please notify
>>> the sender immediately by e-mail if you have received this e-mail by
>>> mistake and delete this e-mail from your system. If you are not the
>>> intended recipient you are notified that disclosing, copying, distributing
>>> or taking any action in reliance on the contents of this information is
>>> strictly prohibited.
>>>
>>>
>>> On Mon, 8 Jul 2019 at 15:40, Shannon Duncan 
>>> wrote:
>>>
 I'm sure I could use some of the existing aggregations as a guide on
 how to make aggregations to fill the gap of missing ones. Such as creating
 Sum/Max/Min.

 GroupBy is really already handled with GroupByKey and CoGroupByKey
 unless you are thinking of a different type of GroupBy?

 - Shannon

 On Sun, Jul 7, 2019 at 10:47 PM Rui Wang  wrote:

> Maybe also adding Aggregation/GroupBy as utilities?
>
>
> -Rui
>
> On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> Thanks Valentyn,
>>
>> I'll outline the utilities and accept any suggestions to add /
>> modify. These are really just shortcut PTransforms that I am working on 
>> to
>> simplify creating pipelines.
>>
>> Currently the utilities contain the following PTransforms:
>>
>> - Inner Join
>> - Left Outer Join
>> - Right Outer Join
>> - Full Outer Join
>> - PrepareKey (For selecting items in a dictionary to act as a key for
>> the joins)
>> - Select (very simple filter that returns only items you want from
>> the dictionary) (allows for defining a default nullValue)
>>
>> Currently these operations only work with dictionaries, but I'd be
>> interested to see how it would work for  tuples.
>>
>> I'm new to python so they may not be optimized or the best way, but
>> from my 

Re: pipeline timeout

2019-07-08 Thread Mark Liu
Hi Chaim,

You can checkout PipelineResult

class and do something like:

result = p.run()
result.wait_until_finish(duration=TIMEOUT_SEC)
if not PipelineState.is_terminal(result.state):
  result.cancel()

The implementation of PipelineResult

depends on what runner you choose. And you may find more useful functions
in its subclass.

Mark


On Sun, Jul 7, 2019 at 12:59 AM Chaim Turkel  wrote:

> Hi,
>   I have a pipeline that usually takes 15-30 minutes. Sometimes things
> get stuck (from 3rd party side). I would like to know if there is a
> way to cancel the job if it is running for more than x minutes? I know
> there is a cli command but i would like it either on the pipeline
> config or in the python sdk.
> Any ideas?
>
> Chaim Turkel
>
> --
>
>
> Loans are funded by
> FinWise Bank, a Utah-chartered bank located in Sandy,
> Utah, member FDIC, Equal
> Opportunity Lender. Merchant Cash Advances are
> made by Behalf. For more
> information on ECOA, click here
> . For important information about
> opening a new
> account, review Patriot Act procedures here
> .
> Visit Legal
>  to
> review our comprehensive program terms,
> conditions, and disclosures.
>


Re: Python Utilities

2019-07-08 Thread Shannon Duncan
As a follow up. Here is the repo that contains the utilities for now.
https://github.com/shadowcodex/apache-beam-utilities. Will put together a
proper PR as code gets closer to production quality.

- Shannon

On Mon, Jul 8, 2019 at 9:20 AM Shannon Duncan 
wrote:

> Thanks Frederik,
>
> That's exactly where I was looking. I did get permission to open source
> the utilities module. So I'm going to throw them up on my personal github
> soon and share with the email group for a look over.
>
> I'm going to work on the utilities there because it's a quick dev
> environment and then once they are ready for proper PR I'll begin working
> them into the actual SDK for a PR.
>
> I also joined the slack #beam and #beam-python channels, I was unsure of
> where most collaborators discussed items.
>
> - Shannon
>
> On Mon, Jul 8, 2019 at 9:09 AM Frederik Bode  wrote:
>
>> Hi Shannon,
>>
>> This is probably a good starting point:
>> https://github.com/apache/beam/blob/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5/sdks/python/apache_beam/transforms/combiners.py#L68
>> .
>>
>> Frederik
>>
>> [image: https://ml6.eu]
>> 
>>
>>
>> * Frederik Bode*
>>
>> ML6 Ghent
>> 
>> +32 4 92 78 96 18
>>
>>
>>  DISCLAIMER 
>>
>> This email and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to whom they are
>> addressed. If you have received this email in error please notify the
>> system manager. This message contains confidential information and is
>> intended only for the individual named. If you are not the named addressee
>> you should not disseminate, distribute or copy this e-mail. Please notify
>> the sender immediately by e-mail if you have received this e-mail by
>> mistake and delete this e-mail from your system. If you are not the
>> intended recipient you are notified that disclosing, copying, distributing
>> or taking any action in reliance on the contents of this information is
>> strictly prohibited.
>>
>>
>> On Mon, 8 Jul 2019 at 15:40, Shannon Duncan 
>> wrote:
>>
>>> I'm sure I could use some of the existing aggregations as a guide on how
>>> to make aggregations to fill the gap of missing ones. Such as creating
>>> Sum/Max/Min.
>>>
>>> GroupBy is really already handled with GroupByKey and CoGroupByKey
>>> unless you are thinking of a different type of GroupBy?
>>>
>>> - Shannon
>>>
>>> On Sun, Jul 7, 2019 at 10:47 PM Rui Wang  wrote:
>>>
 Maybe also adding Aggregation/GroupBy as utilities?


 -Rui

 On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:

> Thanks Valentyn,
>
> I'll outline the utilities and accept any suggestions to add / modify.
> These are really just shortcut PTransforms that I am working on to 
> simplify
> creating pipelines.
>
> Currently the utilities contain the following PTransforms:
>
> - Inner Join
> - Left Outer Join
> - Right Outer Join
> - Full Outer Join
> - PrepareKey (For selecting items in a dictionary to act as a key for
> the joins)
> - Select (very simple filter that returns only items you want from the
> dictionary) (allows for defining a default nullValue)
>
> Currently these operations only work with dictionaries, but I'd be
> interested to see how it would work for  tuples.
>
> I'm new to python so they may not be optimized or the best way, but
> from my understanding these seem to be the best way to do these types of
> operations. Essentially I created a pipeline to be able to convert a 
> simple
> sql query into a flow of these utilities. Using prepareKey to define your
> joining key, joining, and then selecting from the join allows you to do a
> lot of powerful manipulation in a simple / familiar way.
>
> If this is something that we'd like to add to the Beam SDK I don't
> mind looking at the contributor license agreement, and conversing more on
> how to get them in.
>
> Thanks,
> Shannon
>
>
>
> On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
>
>> Hi Shannon,
>>
>> Thanks for considering a contribution to Beam Python SDK. With a
>> direct contribution to Beam SDK, your change will reach 

Re: Python Utilities

2019-07-08 Thread Shannon Duncan
Thanks Frederik,

That's exactly where I was looking. I did get permission to open source the
utilities module. So I'm going to throw them up on my personal github soon
and share with the email group for a look over.

I'm going to work on the utilities there because it's a quick dev
environment and then once they are ready for proper PR I'll begin working
them into the actual SDK for a PR.

I also joined the slack #beam and #beam-python channels, I was unsure of
where most collaborators discussed items.

- Shannon

On Mon, Jul 8, 2019 at 9:09 AM Frederik Bode  wrote:

> Hi Shannon,
>
> This is probably a good starting point:
> https://github.com/apache/beam/blob/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5/sdks/python/apache_beam/transforms/combiners.py#L68
> .
>
> Frederik
>
> [image: https://ml6.eu]
> 
>
>
> * Frederik Bode*
>
> ML6 Ghent
> 
> +32 4 92 78 96 18
>
>
>  DISCLAIMER 
>
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the system manager.
> This message contains confidential information and is intended only for the
> individual named. If you are not the named addressee you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately by e-mail if you have received this e-mail by mistake and
> delete this e-mail from your system. If you are not the intended recipient
> you are notified that disclosing, copying, distributing or taking any
> action in reliance on the contents of this information is strictly
> prohibited.
>
>
> On Mon, 8 Jul 2019 at 15:40, Shannon Duncan 
> wrote:
>
>> I'm sure I could use some of the existing aggregations as a guide on how
>> to make aggregations to fill the gap of missing ones. Such as creating
>> Sum/Max/Min.
>>
>> GroupBy is really already handled with GroupByKey and CoGroupByKey unless
>> you are thinking of a different type of GroupBy?
>>
>> - Shannon
>>
>> On Sun, Jul 7, 2019 at 10:47 PM Rui Wang  wrote:
>>
>>> Maybe also adding Aggregation/GroupBy as utilities?
>>>
>>>
>>> -Rui
>>>
>>> On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 Thanks Valentyn,

 I'll outline the utilities and accept any suggestions to add / modify.
 These are really just shortcut PTransforms that I am working on to simplify
 creating pipelines.

 Currently the utilities contain the following PTransforms:

 - Inner Join
 - Left Outer Join
 - Right Outer Join
 - Full Outer Join
 - PrepareKey (For selecting items in a dictionary to act as a key for
 the joins)
 - Select (very simple filter that returns only items you want from the
 dictionary) (allows for defining a default nullValue)

 Currently these operations only work with dictionaries, but I'd be
 interested to see how it would work for  tuples.

 I'm new to python so they may not be optimized or the best way, but
 from my understanding these seem to be the best way to do these types of
 operations. Essentially I created a pipeline to be able to convert a simple
 sql query into a flow of these utilities. Using prepareKey to define your
 joining key, joining, and then selecting from the join allows you to do a
 lot of powerful manipulation in a simple / familiar way.

 If this is something that we'd like to add to the Beam SDK I don't mind
 looking at the contributor license agreement, and conversing more on how to
 get them in.

 Thanks,
 Shannon



 On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev 
 wrote:

> Hi Shannon,
>
> Thanks for considering a contribution to Beam Python SDK. With a
> direct contribution to Beam SDK, your change will reach larger audience of
> users, and you will not have to maintain a separate project and keep it up
> to date with new releases of Beam.
>
> I encourage you to take a look at https://beam.apache.org/contribute/ for
> general advice on how to get started. To echo some points mentioned in the
> guide:
>
> - If your change is large or it is your first change, it is a good
> idea to discuss it on 

Re: Python Utilities

2019-07-08 Thread Frederik Bode
Hi Shannon,

This is probably a good starting point:
https://github.com/apache/beam/blob/2d5e493abf39ee6fc89831bb0b7ec9fee592b9c5/sdks/python/apache_beam/transforms/combiners.py#L68
.

Frederik

[image: https://ml6.eu]



* Frederik Bode*

ML6 Ghent

+32 4 92 78 96 18


 DISCLAIMER 

This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system manager.
This message contains confidential information and is intended only for the
individual named. If you are not the named addressee you should not
disseminate, distribute or copy this e-mail. Please notify the sender
immediately by e-mail if you have received this e-mail by mistake and
delete this e-mail from your system. If you are not the intended recipient
you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly
prohibited.


On Mon, 8 Jul 2019 at 15:40, Shannon Duncan 
wrote:

> I'm sure I could use some of the existing aggregations as a guide on how
> to make aggregations to fill the gap of missing ones. Such as creating
> Sum/Max/Min.
>
> GroupBy is really already handled with GroupByKey and CoGroupByKey unless
> you are thinking of a different type of GroupBy?
>
> - Shannon
>
> On Sun, Jul 7, 2019 at 10:47 PM Rui Wang  wrote:
>
>> Maybe also adding Aggregation/GroupBy as utilities?
>>
>>
>> -Rui
>>
>> On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan 
>> wrote:
>>
>>> Thanks Valentyn,
>>>
>>> I'll outline the utilities and accept any suggestions to add / modify.
>>> These are really just shortcut PTransforms that I am working on to simplify
>>> creating pipelines.
>>>
>>> Currently the utilities contain the following PTransforms:
>>>
>>> - Inner Join
>>> - Left Outer Join
>>> - Right Outer Join
>>> - Full Outer Join
>>> - PrepareKey (For selecting items in a dictionary to act as a key for
>>> the joins)
>>> - Select (very simple filter that returns only items you want from the
>>> dictionary) (allows for defining a default nullValue)
>>>
>>> Currently these operations only work with dictionaries, but I'd be
>>> interested to see how it would work for  tuples.
>>>
>>> I'm new to python so they may not be optimized or the best way, but from
>>> my understanding these seem to be the best way to do these types of
>>> operations. Essentially I created a pipeline to be able to convert a simple
>>> sql query into a flow of these utilities. Using prepareKey to define your
>>> joining key, joining, and then selecting from the join allows you to do a
>>> lot of powerful manipulation in a simple / familiar way.
>>>
>>> If this is something that we'd like to add to the Beam SDK I don't mind
>>> looking at the contributor license agreement, and conversing more on how to
>>> get them in.
>>>
>>> Thanks,
>>> Shannon
>>>
>>>
>>>
>>> On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev 
>>> wrote:
>>>
 Hi Shannon,

 Thanks for considering a contribution to Beam Python SDK. With a direct
 contribution to Beam SDK, your change will reach larger audience of users,
 and you will not have to maintain a separate project and keep it up to date
 with new releases of Beam.

 I encourage you to take a look at https://beam.apache.org/contribute/ for
 general advice on how to get started. To echo some points mentioned in the
 guide:

 - If your change is large or it is your first change, it is a good idea
 to discuss it on the dev@ mailing list
 - For large changes create a design doc (template, examples) and email
 it to the dev@ mailing list.

 Thanks,
 Valentyn

 On Wed, Jul 3, 2019 at 3:04 PM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:

> I have been writing a bunch of utilities for the python SDK such as
> joins, selections, composite transforms, etc...
>
> I am working with my company to see if I can open source the
> utilities. Would it be best to post them on a separate PyPi project, or to
> PR them into the beam SDK? I assume if they let me open source it they 
> will
> want some attribution or something like that.
>
> Thanks,
> Shannon
>



Re: Python Utilities

2019-07-08 Thread Shannon Duncan
I'm sure I could use some of the existing aggregations as a guide on how to
make aggregations to fill the gap of missing ones. Such as creating
Sum/Max/Min.

GroupBy is really already handled with GroupByKey and CoGroupByKey unless
you are thinking of a different type of GroupBy?

- Shannon

On Sun, Jul 7, 2019 at 10:47 PM Rui Wang  wrote:

> Maybe also adding Aggregation/GroupBy as utilities?
>
>
> -Rui
>
> On Sun, Jul 7, 2019 at 1:46 PM Shannon Duncan 
> wrote:
>
>> Thanks Valentyn,
>>
>> I'll outline the utilities and accept any suggestions to add / modify.
>> These are really just shortcut PTransforms that I am working on to simplify
>> creating pipelines.
>>
>> Currently the utilities contain the following PTransforms:
>>
>> - Inner Join
>> - Left Outer Join
>> - Right Outer Join
>> - Full Outer Join
>> - PrepareKey (For selecting items in a dictionary to act as a key for the
>> joins)
>> - Select (very simple filter that returns only items you want from the
>> dictionary) (allows for defining a default nullValue)
>>
>> Currently these operations only work with dictionaries, but I'd be
>> interested to see how it would work for  tuples.
>>
>> I'm new to python so they may not be optimized or the best way, but from
>> my understanding these seem to be the best way to do these types of
>> operations. Essentially I created a pipeline to be able to convert a simple
>> sql query into a flow of these utilities. Using prepareKey to define your
>> joining key, joining, and then selecting from the join allows you to do a
>> lot of powerful manipulation in a simple / familiar way.
>>
>> If this is something that we'd like to add to the Beam SDK I don't mind
>> looking at the contributor license agreement, and conversing more on how to
>> get them in.
>>
>> Thanks,
>> Shannon
>>
>>
>>
>> On Wed, Jul 3, 2019 at 5:16 PM Valentyn Tymofieiev 
>> wrote:
>>
>>> Hi Shannon,
>>>
>>> Thanks for considering a contribution to Beam Python SDK. With a direct
>>> contribution to Beam SDK, your change will reach larger audience of users,
>>> and you will not have to maintain a separate project and keep it up to date
>>> with new releases of Beam.
>>>
>>> I encourage you to take a look at https://beam.apache.org/contribute/ for
>>> general advice on how to get started. To echo some points mentioned in the
>>> guide:
>>>
>>> - If your change is large or it is your first change, it is a good idea
>>> to discuss it on the dev@ mailing list
>>> - For large changes create a design doc (template, examples) and email
>>> it to the dev@ mailing list.
>>>
>>> Thanks,
>>> Valentyn
>>>
>>> On Wed, Jul 3, 2019 at 3:04 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 I have been writing a bunch of utilities for the python SDK such as
 joins, selections, composite transforms, etc...

 I am working with my company to see if I can open source the utilities.
 Would it be best to post them on a separate PyPi project, or to PR them
 into the beam SDK? I assume if they let me open source it they will want
 some attribution or something like that.

 Thanks,
 Shannon

>>>


Beam Dependency Check Report (2019-07-08)

2019-07-08 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
google-cloud-core
0.29.1
1.0.2
2019-02-04
2019-06-17BEAM-5538
google-cloud-pubsub
0.39.1
0.42.1
2019-01-21
2019-06-24BEAM-5539
mock
2.0.0
3.0.5
2019-05-20
2019-05-20BEAM-7369
oauth2client
3.0.0
4.1.3
2018-12-10
2018-12-10BEAM-6089
Sphinx
1.8.5
2.1.2
2019-05-20
2019-06-24BEAM-7370
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.google.auto.service:auto-service
1.0-rc2
1.0-rc5
2018-06-25
2019-07-08BEAM-5541
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.17.0
0.21.0
2019-02-11
2019-03-04BEAM-6645
org.conscrypt:conscrypt-openjdk
1.1.3
2.1.0
None
2019-07-08BEAM-5748
javax.servlet:javax.servlet-api
3.1.0
4.0.1
None
2019-07-08BEAM-5750
junit:junit
4.13-beta-1
4.13-beta-3
None
2019-07-08BEAM-6127
com.github.spotbugs:spotbugs-annotations
3.1.11
4.0.0-beta3
None
2019-07-08BEAM-6951

 A dependency update is high priority if it satisfies one of following criteria: 

 It has major versions update available, e.g. org.assertj:assertj-core 2.5.0 -> 3.10.0; 


 It is over 3 minor versions behind the latest version, e.g. org.tukaani:xz 1.5 -> 1.8; 


 The current version is behind the later version for over 180 days, e.g. com.google.auto.service:auto-service 2014-10-24 -> 2017-12-11. 

 In Beam, we make a best-effort attempt at keeping all dependencies up-to-date.
 In the future, issues will be filed and tracked for these automatically,
 but in the meantime you can search for existing issues or open a new one.

 For more information:  Beam Dependency Guide  

Re: [VOTE] Vendored dependencies release process

2019-07-08 Thread Łukasz Gajowy
+1

Thanks for documenting the process clearly.

Łukasz

sob., 6 lip 2019 o 20:32 David Morávek  napisał(a):

> +1
>
> Sent from my iPhone
>
> On 6 Jul 2019, at 11:25, Lukasz Cwik  wrote:
>
> +1
>
> On Wed, Jul 3, 2019 at 10:24 AM Jens Nyman  wrote:
>
>> +1
>>
>> On 2019/07/02 23:49:10, Lukasz Cwik  wrote:
>> > Please vote based on the vendored dependencies release process as>
>> > discussed[1] and documented[2].>
>> >
>> > Please vote as follows:>
>> > +1: Adopt the vendored dependency release process>
>> > -1: The vendored release process needs to change because ...>
>> >
>> > Since many people in the US may be out due to the holiday schedule,
>> I'll>
>> > try to close the vote and tally the results on July 9th so please vote>
>> > before then.>
>> >
>> > 1:>
>> >
>> https://lists.apache.org/thread.html/e2c49a5efaee2ad416b083fbf3b9b6db60fdb04750208bfc34cecaf0@%3Cdev.beam.apache.org%3E>
>>
>> > 2: https://s.apache.org/beam-release-vendored-artifacts>
>> >
>>
>