Re: [VOTE] Accept the Firefly design donation as Beam Mascot - Deadline Mon April 6

2020-04-03 Thread Ahmet Altay
+1!

On Fri, Apr 3, 2020 at 10:36 AM Kenneth Knowles  wrote:

> +1 (binding)
>
> Kenn
>
> On Fri, Apr 3, 2020 at 10:19 AM Gris Cuevas  wrote:
>
>> +1
>>
>> On 2020/04/02 17:18:47, Julian Bruno  wrote:
>> > Hello Apache Beam Community,
>> >
>> > Please vote on the acceptance of the final design of the Firefly as
>> Beam's
>> > mascot [1]. Please share your input no later than Monday, April 6, at
>> noon
>> > Pacific Time.
>> >
>> > [ ] +1, Accept the donation of the Firefly design as Beam Mascot
>> >
>> > [ ] -1, Decline the donation of the Firefly design as Beam Mascot
>> >
>> > Vote is adopted by at least 3 PMC +1 approval votes, with no PMC -1
>> > disapproval
>> >
>> > votes. Non-PMC votes are still encouraged.
>> >
>> > PMC voters, please help by indicating your vote as "(binding)"
>> >
>> > The vote and input phase will be open until Monday, April 6, at 12 pm
>> > Pacific Time.
>> >
>> > Thank you very much for your feedback and ideas,
>> >
>> > Julian
>> >
>> > [1]
>> >
>> https://docs.google.com/document/d/1zK8Cm8lwZ3ALVFpD1aY7TLCVNwlyTS3PXxTV2qQCAbk/edit?usp=sharing
>> >
>> >
>> > --
>> > Julian Bruno // Visual Artist & Graphic Designer
>> >  (510) 367-0551 / SF Bay Area, CA
>> > www.instagram.com/julbro.art
>> >
>>
>


Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-04-03 Thread Robert Bradshaw
https://pypistats.org/packages/apache-beam is an interesting data point.

The good news: Python 3.x more than doubled to nearly 40% of downloads last
month. Interestingly, it looks like a good chunk of this increase was 3.5
(which is now the most popular 3.x version by this metric...)

I agree with using Python EOL dates as a baseline, with the possibility of
case-by-case adjustments. Refactoring our tests to support 3.8 without
increasing the load should be our focus now.


On Fri, Apr 3, 2020 at 3:41 PM Valentyn Tymofieiev 
wrote:

> Some good news on  Python 3.x support: thanks to +David Song
>  and +Yifan Zou  we now
> have Python 3.8 on Jenkins, and can start working on adding Python 3.8
> support to Beam (BEAM-8494).
>
> One interesting variable that has not being mentioned is what versions of
>> python 3
>> are available to users via their distribution channels (the linux
>> distributions they use to develop/run the pipelines).
>
>
> Good point. Looking at Ubuntu 16.04, which comes with Python 3.5.2, we can
> see that  the end-of-life for 16.04 is in 2024, end-of-support is April
> 2021 [1]. Both of these dates are beyond the announced Python 3.5 EOL in
> September 2020 [2]. I think it would be difficult for Beam to keep Py3.5
> support until these EOL dates, and users of systems that stock old versions
> of Python have viable workarounds:
> - install a newer version of Python interpreter via pyenv[3], from
> sources, or from alternative repositories.
> - use a docker container that comes with a newer version of interpreter.
> - use older versions of Beam.
>
> We didn't receive feedback from user@ on how long 3.x versions on the
> lower/higher end of the range should stay supported.  I would suggest for
> now that we plan to support all Python 3.x versions that were released and
> did not reach EOL. We can discuss exceptions to this rule on a case-by-case
> basis, evaluating any maintenance burden to continue support, or stop early.
>
> We should now focus on adjusting our Python test infrastructure to make it
> easy to split 3.5, 3.6, 3.7, 3.8  suites into high-priority and
> low-priority suites according to the Python version. Ideally, we
> should make it easy to change which versions are high/low priority without
> having to change all the individual test suites, and without losing test
> coverage signal.
>
> [1] https://wiki.ubuntu.com/Releases
> [2] https://devguide.python.org/#status-of-python-branches
> [3] https://github.com/pyenv/pyenv/blob/master/README.md
>
> On Fri, Feb 28, 2020 at 1:25 AM Ismaël Mejía  wrote:
>
>> One interesting variable that has not being mentioned is what versions of
>> python
>> 3 are available to users via their distribution channels (the linux
>> distributions they use to develop/run the pipelines).
>>
>> - RHEL 8 users have python 3.6 available
>> - RHEL 7 users have python 3.6 available
>> - Debian 10/Ubuntu 18.04 users have python 3.7/3.6 available
>> - Debian 9/Ubuntu 16.04 users have python 3.5 available
>>
>
>> We should consider this when we evaluate future support removals.
>>
>> Given  that the distros that support python 3.5 are ~4y old and since
>> python 3.5
>> is also losing LTS support soon is probably ok to not support it in Beam
>> anymore as Robert suggests.
>>
>>
>> On Thu, Feb 27, 2020 at 3:57 AM Valentyn Tymofieiev 
>> wrote:
>>
>>> Thanks everyone for sharing your perspectives so far. It sounds like we
>>> can mitigate the cost of test infrastructure by having:
>>> - a selection of (fast) tests that we will want to run against all
>>> Python versions we support.
>>> - high priority Python versions, which we will test extensively.
>>> - infrequent postcommit test that exercise low-priority versions.
>>> We will need test infrastructure improvements to have the flexibility of
>>> designating versions of high-pri/low-pri and minimizing efforts requiring
>>> adopting a new version.
>>>
>>> There is still a question of how long we want to support old Py3.x
>>> versions. As mentioned above, I think we should not support them beyond EOL
>>> (5 years after a release). I wonder if that is still too long. The cost of
>>> supporting a version may include:
>>>  - Developing against older Python version
>>>  - Release overhead (building & storing containers, wheels, doing
>>> release validation)
>>>  - Complexity / development cost to support the quirks of the minor
>>> versions.
>>>
>>> We can decide to drop support, after, say, 4 years, or after usage drops
>>> below a threshold, or decide on a case-by-case basis. Thoughts? Also asked
>>> for feedback on user@ [1]
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/r630a3b55aa8e75c68c8252ea6f824c3ab231ad56e18d916dfb84d9e8%40%3Cuser.beam.apache.org%3E
>>>
>>> On Wed, Feb 26, 2020 at 5:27 PM Robert Bradshaw 
>>> wrote:
>>>
 On Wed, Feb 26, 2020 at 5:21 PM Valentyn Tymofieiev <
 valen...@google.com> wrote:
 >
 > > +1 to consulting users.
 > I will message user@ as well and point to 

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

2020-04-03 Thread Valentyn Tymofieiev
Some good news on  Python 3.x support: thanks to +David Song
 and +Yifan Zou  we now have
Python 3.8 on Jenkins, and can start working on adding Python 3.8 support
to Beam (BEAM-8494).

One interesting variable that has not being mentioned is what versions of
> python 3
> are available to users via their distribution channels (the linux
> distributions they use to develop/run the pipelines).


Good point. Looking at Ubuntu 16.04, which comes with Python 3.5.2, we can
see that  the end-of-life for 16.04 is in 2024, end-of-support is April
2021 [1]. Both of these dates are beyond the announced Python 3.5 EOL in
September 2020 [2]. I think it would be difficult for Beam to keep Py3.5
support until these EOL dates, and users of systems that stock old versions
of Python have viable workarounds:
- install a newer version of Python interpreter via pyenv[3], from sources,
or from alternative repositories.
- use a docker container that comes with a newer version of interpreter.
- use older versions of Beam.

We didn't receive feedback from user@ on how long 3.x versions on the
lower/higher end of the range should stay supported.  I would suggest for
now that we plan to support all Python 3.x versions that were released and
did not reach EOL. We can discuss exceptions to this rule on a case-by-case
basis, evaluating any maintenance burden to continue support, or stop early.

We should now focus on adjusting our Python test infrastructure to make it
easy to split 3.5, 3.6, 3.7, 3.8  suites into high-priority and
low-priority suites according to the Python version. Ideally, we
should make it easy to change which versions are high/low priority without
having to change all the individual test suites, and without losing test
coverage signal.

[1] https://wiki.ubuntu.com/Releases
[2] https://devguide.python.org/#status-of-python-branches
[3] https://github.com/pyenv/pyenv/blob/master/README.md

On Fri, Feb 28, 2020 at 1:25 AM Ismaël Mejía  wrote:

> One interesting variable that has not being mentioned is what versions of
> python
> 3 are available to users via their distribution channels (the linux
> distributions they use to develop/run the pipelines).
>
> - RHEL 8 users have python 3.6 available
> - RHEL 7 users have python 3.6 available
> - Debian 10/Ubuntu 18.04 users have python 3.7/3.6 available
> - Debian 9/Ubuntu 16.04 users have python 3.5 available
>

> We should consider this when we evaluate future support removals.
>
> Given  that the distros that support python 3.5 are ~4y old and since
> python 3.5
> is also losing LTS support soon is probably ok to not support it in Beam
> anymore as Robert suggests.
>
>
> On Thu, Feb 27, 2020 at 3:57 AM Valentyn Tymofieiev 
> wrote:
>
>> Thanks everyone for sharing your perspectives so far. It sounds like we
>> can mitigate the cost of test infrastructure by having:
>> - a selection of (fast) tests that we will want to run against all Python
>> versions we support.
>> - high priority Python versions, which we will test extensively.
>> - infrequent postcommit test that exercise low-priority versions.
>> We will need test infrastructure improvements to have the flexibility of
>> designating versions of high-pri/low-pri and minimizing efforts requiring
>> adopting a new version.
>>
>> There is still a question of how long we want to support old Py3.x
>> versions. As mentioned above, I think we should not support them beyond EOL
>> (5 years after a release). I wonder if that is still too long. The cost of
>> supporting a version may include:
>>  - Developing against older Python version
>>  - Release overhead (building & storing containers, wheels, doing release
>> validation)
>>  - Complexity / development cost to support the quirks of the minor
>> versions.
>>
>> We can decide to drop support, after, say, 4 years, or after usage drops
>> below a threshold, or decide on a case-by-case basis. Thoughts? Also asked
>> for feedback on user@ [1]
>>
>> [1]
>> https://lists.apache.org/thread.html/r630a3b55aa8e75c68c8252ea6f824c3ab231ad56e18d916dfb84d9e8%40%3Cuser.beam.apache.org%3E
>>
>> On Wed, Feb 26, 2020 at 5:27 PM Robert Bradshaw 
>> wrote:
>>
>>> On Wed, Feb 26, 2020 at 5:21 PM Valentyn Tymofieiev 
>>> wrote:
>>> >
>>> > > +1 to consulting users.
>>> > I will message user@ as well and point to this thread.
>>> >
>>> > > I would propose getting in warnings about 3.5 EoL well ahead of time.
>>> > I think we should document on our website, and  in the code (warnings)
>>> that users should not expect SDKs to be supported in Beam beyond the EOL.
>>> If we want to have flexibility to drop support earlier than EOL, we need to
>>> be more careful with messaging because users might otherwise expect that
>>> support will last until EOL, if we mention EOL date.
>>>
>>> +1
>>>
>>> > I am hoping that we can establish a consensus for when we will be
>>> dropping support for a version, so that we don't have to discuss it on a
>>> case by case basis in the future.
>>> >
>>> > > I think it 

Re: [VOTE] Release 2.20.0, release candidate #1

2020-04-03 Thread Ahmet Altay
+1 - Validated python quickstart examples. Thank you for preparing the RC.

On Fri, Apr 3, 2020 at 12:25 PM Ismaël Mejía  wrote:

> Can somebody with windows please validate this one:
> https://issues.apache.org/jira/browse/BEAM-9452
>
> We really need to put some windows tests in place in the future. Maybe we
> can
> try github actions for this (but well the vote is not the place to
> discuss this).
>

I completely agree with you. I think we kind of already discussed this (
https://issues.apache.org/jira/browse/BEAM-9388) but we did not get a
chance to work on it.


>
> On Fri, Apr 3, 2020 at 8:16 PM Rui Wang  wrote:
> >
> > Add Maven and Java versions that were used for building java artifacts:
> > maven: 3.6.2
> > java: 1.8.0_181
> >
> >
> > -Rui
> >
> > On Thu, Apr 2, 2020 at 9:06 PM Rui Wang  wrote:
> >>
> >> Hi everyone,
> >> Please review and vote on the release candidate #1 for the version
> 1.20.0, as follows:
> >> [ ] +1, Approve the release
> >> [ ] -1, Do not approve the release (please provide specific comments)
> >>
> >>
> >> The complete staging area is available for your review, which includes:
> >> * JIRA release notes [1],
> >> * the official Apache source release to be deployed to dist.apache.org
> [2], which is signed with the key with fingerprint 699A 22D2 D4F0 0AD3
> 957B  6A88 38B1 C6B4 25EB A67C [3],
> >> * all artifacts to be deployed to the Maven Central Repository [4],
> >> * source code tag "v1.20.0-RC1" [5],
> >> * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [8].
> >> * Java artifacts were built with Maven MAVEN_VERSION and OpenJDK/Oracle
> JDK JDK_VERSION.
> >> TODO: do these versions matter, and are they stamped into the artifacts?
> >> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> >> * Validation sheet with a tab for 2.20.0 release to help with
> validation [9].
> >> * Docker images published to Docker Hub [10].
> >>
> >> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> >>
> >> Thanks,
> >> Release Manager
> >>
> >> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12346780
> >> [2] https://dist.apache.org/repos/dist/dev/beam/2.20.0/
> >> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> >> [4]
> https://repository.apache.org/content/repositories/orgapachebeam-1100/
> >> [5] https://github.com/apache/beam/tree/v2.20.0-RC1
> >> [6] https://github.com/apache/beam/pull/11285
> >> [7] https://github.com/apache/beam-site/pull/602
> >> [8] https://github.com/apache/beam/pull/11298
> >> [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=318600984
> >> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>


Re: [VOTE] Release 2.20.0, release candidate #1

2020-04-03 Thread Ismaël Mejía
Can somebody with windows please validate this one:
https://issues.apache.org/jira/browse/BEAM-9452

We really need to put some windows tests in place in the future. Maybe we can
try github actions for this (but well the vote is not the place to
discuss this).

On Fri, Apr 3, 2020 at 8:16 PM Rui Wang  wrote:
>
> Add Maven and Java versions that were used for building java artifacts:
> maven: 3.6.2
> java: 1.8.0_181
>
>
> -Rui
>
> On Thu, Apr 2, 2020 at 9:06 PM Rui Wang  wrote:
>>
>> Hi everyone,
>> Please review and vote on the release candidate #1 for the version 1.20.0, 
>> as follows:
>> [ ] +1, Approve the release
>> [ ] -1, Do not approve the release (please provide specific comments)
>>
>>
>> The complete staging area is available for your review, which includes:
>> * JIRA release notes [1],
>> * the official Apache source release to be deployed to dist.apache.org [2], 
>> which is signed with the key with fingerprint 699A 22D2 D4F0 0AD3 957B  6A88 
>> 38B1 C6B4 25EB A67C [3],
>> * all artifacts to be deployed to the Maven Central Repository [4],
>> * source code tag "v1.20.0-RC1" [5],
>> * website pull request listing the release [6], publishing the API reference 
>> manual [7], and the blog post [8].
>> * Java artifacts were built with Maven MAVEN_VERSION and OpenJDK/Oracle JDK 
>> JDK_VERSION.
>> TODO: do these versions matter, and are they stamped into the artifacts?
>> * Python artifacts are deployed along with the source release to the 
>> dist.apache.org [2].
>> * Validation sheet with a tab for 2.20.0 release to help with validation [9].
>> * Docker images published to Docker Hub [10].
>>
>> The vote will be open for at least 72 hours. It is adopted by majority 
>> approval, with at least 3 PMC affirmative votes.
>>
>> Thanks,
>> Release Manager
>>
>> [1] 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12346780
>> [2] https://dist.apache.org/repos/dist/dev/beam/2.20.0/
>> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
>> [4] https://repository.apache.org/content/repositories/orgapachebeam-1100/
>> [5] https://github.com/apache/beam/tree/v2.20.0-RC1
>> [6] https://github.com/apache/beam/pull/11285
>> [7] https://github.com/apache/beam-site/pull/602
>> [8] https://github.com/apache/beam/pull/11298
>> [9] 
>> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=318600984
>> [10] https://hub.docker.com/search?q=apache%2Fbeam=image


Re: PythonDocker PreCommit

2020-04-03 Thread Pablo Estrada
Thanks for the update Hannah! It's great to know- and the error message
seems pretty useful.
Best
-P.

On Thu, Apr 2, 2020 at 9:05 PM Hannah Jiang  wrote:

> Hello Dev@
>
> A new precommit was added for Python. It is in master now.
> *PythonDocker ("Run PythonDocker PreCommit")* is a job to build Python
> SDK containers with licenses/notices of third party dependencies.
> The trigger rule is the same as Python Precommit and the test runs for
> Py2, Py35, Py36 and Py37.
>
> Licenses/notices of third party dependencies are added during docker build
> and the job would fail if any license of dependencies cannot be pulled
> *automatically*.
> Here are examples when it fails.
> 1. a PR added new dependencies and their licenses are not able to be
> pulled automatically.
> 2. Dependency is upgraded to a new version and sometimes, the license is
> no longer able to be pulled automatically from the new version.
>
> Error message is
>
> 
> RuntimeError: Could not retrieve licences for packages [singledispatch] in
> Python2.7 environment.
> These licenses were not able to be pulled automatically. Please search
> code source of the dependencies on the internet and add urls to RAW license
> file at
> sdks/python/container/license_scripts/dep_urls_py.yaml for each missing
> license and rerun the test. If no such urls can be found, you need to
> manually add
> LICENSE and NOTICE (if available) files at
> sdks/python/container/license_scripts/manual_licenses/{dep}/ and add
> entries to sdks/python/container/license_scripts/dep_urls_py.yaml.
>
> 
>
> Please let me know if the error message doesn't provide enough information
> to fix the failures.
> Please feel free to create a ticket and assign to @hannahjiang if you
> notice unexpected behaviors.
>
> Thanks,
> Hannah
>
>


Re: [VOTE] Release 2.20.0, release candidate #1

2020-04-03 Thread Rui Wang
Add Maven and Java versions that were used for building java artifacts:
maven: 3.6.2
java: 1.8.0_181


-Rui

On Thu, Apr 2, 2020 at 9:06 PM Rui Wang  wrote:

> Hi everyone,
> Please review and vote on the release candidate #1 for the version 1.20.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org [2],
> which is signed with the key with fingerprint 699A 22D2 D4F0 0AD3 957B  6A88
> 38B1 C6B4 25EB A67C [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "v1.20.0-RC1" [5],
> * website pull request listing the release [6], publishing the API
> reference manual [7], and the blog post [8].
> * Java artifacts were built with Maven MAVEN_VERSION and OpenJDK/Oracle
> JDK JDK_VERSION.
> TODO: do these versions matter, and are they stamped into the artifacts?
> * Python artifacts are deployed along with the source release to the
> dist.apache.org [2].
> * Validation sheet with a tab for 2.20.0 release to help with validation
> [9].
> * Docker images published to Docker Hub [10].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Release Manager
>
> [1]
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527=12346780
> [2] https://dist.apache.org/repos/dist/dev/beam/2.20.0/
> [3] https://dist.apache.org/repos/dist/release/beam/KEYS
> [4] https://repository.apache.org/content/repositories/orgapachebeam-1100/
> [5] https://github.com/apache/beam/tree/v2.20.0-RC1
> [6] https://github.com/apache/beam/pull/11285
> [7] https://github.com/apache/beam-site/pull/602
> [8] https://github.com/apache/beam/pull/11298
> [9]
> https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=318600984
> [10] https://hub.docker.com/search?q=apache%2Fbeam=image
>


Re: [VOTE] Accept the Firefly design donation as Beam Mascot - Deadline Mon April 6

2020-04-03 Thread Kenneth Knowles
+1 (binding)

Kenn

On Fri, Apr 3, 2020 at 10:19 AM Gris Cuevas  wrote:

> +1
>
> On 2020/04/02 17:18:47, Julian Bruno  wrote:
> > Hello Apache Beam Community,
> >
> > Please vote on the acceptance of the final design of the Firefly as
> Beam's
> > mascot [1]. Please share your input no later than Monday, April 6, at
> noon
> > Pacific Time.
> >
> > [ ] +1, Accept the donation of the Firefly design as Beam Mascot
> >
> > [ ] -1, Decline the donation of the Firefly design as Beam Mascot
> >
> > Vote is adopted by at least 3 PMC +1 approval votes, with no PMC -1
> > disapproval
> >
> > votes. Non-PMC votes are still encouraged.
> >
> > PMC voters, please help by indicating your vote as "(binding)"
> >
> > The vote and input phase will be open until Monday, April 6, at 12 pm
> > Pacific Time.
> >
> > Thank you very much for your feedback and ideas,
> >
> > Julian
> >
> > [1]
> >
> https://docs.google.com/document/d/1zK8Cm8lwZ3ALVFpD1aY7TLCVNwlyTS3PXxTV2qQCAbk/edit?usp=sharing
> >
> >
> > --
> > Julian Bruno // Visual Artist & Graphic Designer
> >  (510) 367-0551 / SF Bay Area, CA
> > www.instagram.com/julbro.art
> >
>


Re: Contributing Twister2 runner to Apache Beam

2020-04-03 Thread Pulasthi Supun Wickramasinghe
Hi Ismaël,

Thanks for the update, No problem at all, please take your time and let me
know if my assistance is needed, The virus has affected everyone's
timetables. I hope you are safe.

Best Regards,
Pulasthi

On Fri, Apr 3, 2020 at 12:14 PM Ismaël Mejía  wrote:

> Hello Pulasthi,
>
> Please excuse me for my delay, I have probably 1/3 of my common
> available time since the coronavirus lockdown so I have not advanced
> as expected. I hope to catch up rapidly and ping you. Our expected
> target of merging it before the 2.21.0 release seems to be hard to get
> at this point because the branch will be cut next week. I hope this is
> not a problem but if it is please excuse me.
>
> I also profit to ask any other Beamer that could have more free cycles
> at the moment in case (s)he can give me an extra hand for the review.
>
> Regards,
> Ismaël
>
>
> On Fri, Apr 3, 2020 at 4:16 AM Pulasthi Supun Wickramasinghe
>  wrote:
> >
> > Hi Ismaël
> >
> > Did you get some free time to perform a code review on the pull request
> >
> > Best Regards
> > Pulasthi
> >
> > On Tue, Mar 10, 2020 at 3:30 PM Luke Cwik  wrote:
> >>
> >> I have to disagree. Allowing for runners within the Apache Beam repo
> and SDKs that reach into the implementation details of each other are
> usability, feature development, maintenance and complexity problems.
> >>
> >> The usability issue comes from our public core facing APIs exposing
> methods that runners "need" so they can introspect details that shouldn't
> be visible to them (e.g. setWindowingStrategyInternal on PCollection).
> Getting to 1 would remove the pipeline construction time instances but not
> the execution side ones and there are currently 100+ usages of the
> @Internal annotation.
> >>
> >> The feature development and maintenance issues both stem from
> duplication of work. We need to have at least two copies of how to do
> something, one that is for runner -> SDK direct and one for Fn API. An
> example of this is the timer family work which was started and completed
> for the non portable implementation yet the portable implementation was
> left as future work.
> >>
> >> Finally, the complexity comes from how many layers we have that wrap
> existing components to create variants for different use cases. I'm looking
> at all the DoFnRunners and each of their variants and how those have layers
> within themselves within the SDK and how additional layers have been made
> to interface with runner specific internal details.
> >>
> >>
> >> On Tue, Mar 10, 2020 at 12:07 PM Kenneth Knowles 
> wrote:
> >>>
> >>> I do support all the efforts to get Dataflow, Flink, and Spark to 3
> (Fn API). But I disagree with it as a requirement; the whole point of
> ptransforms with URNs is that if the runner can figure out how to execute
> it according to semantics, then it is fine. A runner meets (1) and (2) but
> can only run certain subset of DoFns is allowed by design (whether the
> subset is based on language, state/timer support, etc).
> >>>
> >>> Kenn
> >>>
> >>> On Tue, Mar 10, 2020 at 9:45 AM Luke Cwik  wrote:
> 
>  I would like to move away from having runners access APIs that are
> related to pipeline construction and other internal SDK APIs and I would
> like for SDKs to not inspect internal runner APIs. This would enable the
> community to improve each independently without needing to fix the world
> all the time and would enable the community to run a cluster that supports
> multiple Beam versions at the same time and would also allow for the
> cluster to be updated independently of the pipelines it runs.
> 
>  As a community, I believe we need to achieve 1, 2 and 3. Outside of
> the Apache Beam repo, anyone can do whatever they want but there should be
> no compatibility guarantees.
> 
>  4 and 5 are extensions that enable a richer set of pipelines to run
> and are optional like many other parts such as if a runner supports metrics
> aggregation or dynamic work rebalancing.
> 
>  On Tue, Mar 10, 2020 at 9:11 AM Kenneth Knowles 
> wrote:
> >
> > There are a lot of different meanings to "portable runner". Here are
> some:
> >
> > (1) A runner that accepts a pipeline proto and either runs it or
> says it cannot run it
> > (2) A runner that accepts jobs via the job management APIs
> > (3) A runner that executes UDFs via the Fn API
> > (4) A runner that can execute multiple languages
> > (5) A runner that can run cross-language transforms aka multiple
> languages in the same pipeline
> >
> > I think (1) is a very good bar, and (2) is a nice addition on top of
> that. Then we have a unified way to submit pipelines and understand their
> status.
> >
> > I think (3) is optional - a runner can run things however it likes,
> including with native implementations. And then (4) and (5) as well are
> just levels of feature capabilities.
> >
> > Kenn
> >
> > On Tue, Mar 10, 2020 at 8:54 AM Luke Cwik  

Re: [VOTE] Accept the Firefly design donation as Beam Mascot - Deadline Mon April 6

2020-04-03 Thread Gris Cuevas
+1

On 2020/04/02 17:18:47, Julian Bruno  wrote: 
> Hello Apache Beam Community,
> 
> Please vote on the acceptance of the final design of the Firefly as Beam's
> mascot [1]. Please share your input no later than Monday, April 6, at noon
> Pacific Time.
> 
> [ ] +1, Accept the donation of the Firefly design as Beam Mascot
> 
> [ ] -1, Decline the donation of the Firefly design as Beam Mascot
> 
> Vote is adopted by at least 3 PMC +1 approval votes, with no PMC -1
> disapproval
> 
> votes. Non-PMC votes are still encouraged.
> 
> PMC voters, please help by indicating your vote as "(binding)"
> 
> The vote and input phase will be open until Monday, April 6, at 12 pm
> Pacific Time.
> 
> Thank you very much for your feedback and ideas,
> 
> Julian
> 
> [1]
> https://docs.google.com/document/d/1zK8Cm8lwZ3ALVFpD1aY7TLCVNwlyTS3PXxTV2qQCAbk/edit?usp=sharing
> 
> 
> -- 
> Julian Bruno // Visual Artist & Graphic Designer
>  (510) 367-0551 / SF Bay Area, CA
> www.instagram.com/julbro.art
> 


Re: Contributing Twister2 runner to Apache Beam

2020-04-03 Thread Ismaël Mejía
Hello Pulasthi,

Please excuse me for my delay, I have probably 1/3 of my common
available time since the coronavirus lockdown so I have not advanced
as expected. I hope to catch up rapidly and ping you. Our expected
target of merging it before the 2.21.0 release seems to be hard to get
at this point because the branch will be cut next week. I hope this is
not a problem but if it is please excuse me.

I also profit to ask any other Beamer that could have more free cycles
at the moment in case (s)he can give me an extra hand for the review.

Regards,
Ismaël


On Fri, Apr 3, 2020 at 4:16 AM Pulasthi Supun Wickramasinghe
 wrote:
>
> Hi Ismaël
>
> Did you get some free time to perform a code review on the pull request
>
> Best Regards
> Pulasthi
>
> On Tue, Mar 10, 2020 at 3:30 PM Luke Cwik  wrote:
>>
>> I have to disagree. Allowing for runners within the Apache Beam repo and 
>> SDKs that reach into the implementation details of each other are usability, 
>> feature development, maintenance and complexity problems.
>>
>> The usability issue comes from our public core facing APIs exposing methods 
>> that runners "need" so they can introspect details that shouldn't be visible 
>> to them (e.g. setWindowingStrategyInternal on PCollection). Getting to 1 
>> would remove the pipeline construction time instances but not the execution 
>> side ones and there are currently 100+ usages of the @Internal annotation.
>>
>> The feature development and maintenance issues both stem from duplication of 
>> work. We need to have at least two copies of how to do something, one that 
>> is for runner -> SDK direct and one for Fn API. An example of this is the 
>> timer family work which was started and completed for the non portable 
>> implementation yet the portable implementation was left as future work.
>>
>> Finally, the complexity comes from how many layers we have that wrap 
>> existing components to create variants for different use cases. I'm looking 
>> at all the DoFnRunners and each of their variants and how those have layers 
>> within themselves within the SDK and how additional layers have been made to 
>> interface with runner specific internal details.
>>
>>
>> On Tue, Mar 10, 2020 at 12:07 PM Kenneth Knowles  wrote:
>>>
>>> I do support all the efforts to get Dataflow, Flink, and Spark to 3 (Fn 
>>> API). But I disagree with it as a requirement; the whole point of 
>>> ptransforms with URNs is that if the runner can figure out how to execute 
>>> it according to semantics, then it is fine. A runner meets (1) and (2) but 
>>> can only run certain subset of DoFns is allowed by design (whether the 
>>> subset is based on language, state/timer support, etc).
>>>
>>> Kenn
>>>
>>> On Tue, Mar 10, 2020 at 9:45 AM Luke Cwik  wrote:

 I would like to move away from having runners access APIs that are related 
 to pipeline construction and other internal SDK APIs and I would like for 
 SDKs to not inspect internal runner APIs. This would enable the community 
 to improve each independently without needing to fix the world all the 
 time and would enable the community to run a cluster that supports 
 multiple Beam versions at the same time and would also allow for the 
 cluster to be updated independently of the pipelines it runs.

 As a community, I believe we need to achieve 1, 2 and 3. Outside of the 
 Apache Beam repo, anyone can do whatever they want but there should be no 
 compatibility guarantees.

 4 and 5 are extensions that enable a richer set of pipelines to run and 
 are optional like many other parts such as if a runner supports metrics 
 aggregation or dynamic work rebalancing.

 On Tue, Mar 10, 2020 at 9:11 AM Kenneth Knowles  wrote:
>
> There are a lot of different meanings to "portable runner". Here are some:
>
> (1) A runner that accepts a pipeline proto and either runs it or says it 
> cannot run it
> (2) A runner that accepts jobs via the job management APIs
> (3) A runner that executes UDFs via the Fn API
> (4) A runner that can execute multiple languages
> (5) A runner that can run cross-language transforms aka multiple 
> languages in the same pipeline
>
> I think (1) is a very good bar, and (2) is a nice addition on top of 
> that. Then we have a unified way to submit pipelines and understand their 
> status.
>
> I think (3) is optional - a runner can run things however it likes, 
> including with native implementations. And then (4) and (5) as well are 
> just levels of feature capabilities.
>
> Kenn
>
> On Tue, Mar 10, 2020 at 8:54 AM Luke Cwik  wrote:
>>
>> +1
>>
>> On Tue, Mar 10, 2020 at 12:59 AM Alex Van Boxel  wrote:
>>>
>>> One last thing, for any runner after this one... wouldn't it be a good 
>>> acceptance criteria to only accept portable implementations anymore?
>>>
>>>  _/
>>> _/ Alex 

Re: Unportable Dataflow Pipeline Questions

2020-04-03 Thread Luke Cwik
I think making the Dataflow service translate into the pipeline proto
directly will be a lot of work.

On Thu, Apr 2, 2020 at 6:03 PM Robert Burke  wrote:

> It's stateless translation code and nothing is sourced outside of the beam
> pipeline proto, so it should be fairly straightforward code to write and
> test.
>
> One can collect several before and afters of the existing translations and
> use them to validate.
> There are a few quirks that were previously necessary though to get
> Dataflow to work properly for the Go SDK, in particular around DoFns
> without outputs, but that's reasonably clear in the translator.
>
> On Thu, Apr 2, 2020, 5:57 PM Robert Bradshaw  wrote:
>
>> On Thu, Apr 2, 2020 at 7:54 AM Chamikara Jayalath 
>> wrote:
>>
>>>
>>>
>>> On Thu, Apr 2, 2020 at 7:48 AM Robert Burke  wrote:
>>>
 In particular, ideally the Dataflow Service is handling the Dataflow
 specific format translation, rather than each SDK. Move the v1 beta3
 pipeline to an internal detail.

 Ideally Dataflow would support a JobManagment endpoint directly, but I
 imagine that's a more involved task that's out of scope for now.

>>>
>>> Yeah, I think we can just embed the runner API proto in Dataflow job
>>> request (or store it in GCS and Download in router if too large). Then
>>> runner API proto to Dataflow proto translation can occur within Dataflow
>>> service and all SDKs can share that translation logic ((3) below). I agree
>>> that fully migrating Dataflow service to be on job management API seems to
>>> be out of scope.
>>>
>>
>> I've been hoping for that day for a long time now :). I wonder how hard
>> it woud be to extend/embed the existing go translation code into the
>> router.
>>
>>
>>>
>>>

 On Thu, Apr 2, 2020, 7:43 AM Chamikara Jayalath 
 wrote:

>
>
> On Wed, Apr 1, 2020 at 11:31 AM Sam Rohde  wrote:
>
>> Okay cool, so it sounds like the cleanup can be done in two phases:
>> move the apply_ methods to transform replacements, then move Dataflow 
>> onto
>> the Cloudv1b3 protos. AFAIU, after phase one will make the Pipeline 
>> object
>> portable? If the InteractiveRunner were to make a Pipeline object, then 
>> it
>> could be passed to the DataflowRunner to run, correct?
>>
>
> Currently we do the following.
>
> (1) Currently Java and Python SDKs
> SDK specific object representation -> Dataflow job request (v1beta3)
> -> Dataflow service specific representation
> Beam Runner API proto -> store in GCS -> Download in workers.
>
> (2) Currently Go SDK
> SDK specific object representation -> Beam Runner API proto
> -> Dataflow job request (v1beta3) -> Dataflow service specific
> representation
>
> We got cross-language (for Python) working for (1) above but code will
> be much cleaner if we could do (2) for Python and Java
>
> I think the cleanest approach is following which will allow us to
> share translation code across SDKs.
> (3) For all SDKs
> SDK specific object representation -> Runner API proto embedded in
> Dataflow job request -> Runner API proto to internal Dataflow specific
> representation within Dataflow service
>
> I think we should go for a cleaner approach here ((2) or (3)) instead
> of trying to do it in multiple steps (we'll have to keep updating features
> such as a cross-language to be in lockstep which will be hard and result 
> in
> a lot of throwaway work).
>
> Thanks,
> Cham
>
>
>> On Tue, Mar 31, 2020 at 6:01 PM Robert Burke 
>> wrote:
>>
>>> +1 to translation from beam pipeline Protos.
>>>
>>>  The Go SDK does that currently in dataflowlib/translate.go to
>>> handle the current Dataflow situation, so it's certainly doable.
>>>
>>> On Tue, Mar 31, 2020, 5:48 PM Robert Bradshaw 
>>> wrote:
>>>
 On Tue, Mar 31, 2020 at 12:06 PM Sam Rohde 
 wrote:

> Hi All,
>
> I am currently investigating making the Python DataflowRunner to
> use a portable pipeline representation so that we can eventually get 
> rid of
> the Pipeline(runner) weirdness.
>
> In that case, I have a lot questions about the Python
> DataflowRunner:
>
> *PValueCache*
>
>- Why does this exist?
>
> This is historical baggage from the (long gone) first direct
 runner when actual computed PCollections were cached, and the
 DataflowRunner inherited it.


> *DataflowRunner*
>
>- I see that the DataflowRunner defines some PTransforms as
>runner-specific primitives by returning a PCollection.from_(...) 
> in apply_
>methods. Then in the run_ methods, it references the PValueCache 
> to add
>steps.

Re: BEAM-8751: Code Review Wanted for PR 11208

2020-04-03 Thread Luke Cwik
Still too busy to help with this. Does anyone else have time?

On Thu, Apr 2, 2020 at 7:54 PM Tomo Suzuki  wrote:

> Hi Luke and Beam Committers,
>
> Would you review/merge this google-api-client dependency upgrade
> https://github.com/apache/beam/pull/11208
>
> --
> Regards,
> Tomo
>


Re: [VOTE] Accept the Firefly design donation as Beam Mascot - Deadline Mon April 6

2020-04-03 Thread Maximilian Michels
+1 (binding)

On 03.04.20 10:33, Jan Lukavský wrote:
> +1 (non-binding).
> 
> On 4/2/20 9:24 PM, Austin Bennett wrote:
>> +1 (nonbinding)
>>
>> On Thu, Apr 2, 2020 at 12:10 PM Luke Cwik > > wrote:
>>
>> +1 (binding)
>>
>> On Thu, Apr 2, 2020 at 11:54 AM Pablo Estrada > > wrote:
>>
>> +1! (binding)
>>
>> On Thu, Apr 2, 2020 at 11:19 AM Alex Van Boxel
>> mailto:a...@vanboxel.be>> wrote:
>>
>> Thanks for clearing this up Aizhamal.
>>
>> +1 (non binding)
>>
>> _/
>> _/ Alex Van Boxel
>>
>>
>> On Thu, Apr 2, 2020 at 8:14 PM Aizhamal Nurmamat kyzy
>> mailto:aizha...@apache.org>> wrote:
>>
>> Good point, Alex. Actually Julian and I have talked
>> about producing this kind of guide. It will be
>> delivered as an additional contribution in the follow
>> up. We think this will be a derivative of the original
>> design, and be done after the original is officially
>> accepted. 
>>
>> With this vote, we want to accept the Firefly donation
>> as designed [1], and let Julian produce other
>> artifacts using the official Beam mascot later on.
>>
>> [1] 
>> https://docs.google.com/document/d/1zK8Cm8lwZ3ALVFpD1aY7TLCVNwlyTS3PXxTV2qQCAbk/edit?usp=sharing
>>
>>
>> On Thu, Apr 2, 2020 at 10:37 AM Alex Van Boxel
>> mailto:a...@vanboxel.be>> wrote:
>>
>> I don't want to be a spoiler... but this vote
>> feels like a final deliverable... but without a
>> style guide as Kenn originally suggested most of
>> use will not be able to adapt the design. This
>> would include:
>>
>>   * frontal view
>>   * side view
>>   * back view
>>
>> actually different posses so we can mix and match.
>> Without this it will never reach the potential of
>> the Go gopher or gRPC Pancakes.
>>
>> Note this is *not* a negative vote but I'm afraid
>> that the use without a guide will be fairly
>> limited as most of use are not designers. Just a
>> concern.
>>
>>  _/
>> _/ Alex Van Boxel
>>
>>
>> On Thu, Apr 2, 2020 at 7:27 PM Andrew Pilloud
>> mailto:apill...@apache.org>>
>> wrote:
>>
>> +1, Accept the donation of the Firefly design
>> as Beam Mascot
>>
>> On Thu, Apr 2, 2020 at 10:19 AM Julian Bruno
>> > > wrote:
>>
>> Hello Apache Beam Community, 
>>
>> Please vote on the acceptance of the final
>> design of the Firefly as Beam's mascot
>> [1]. Please share your input no later than
>> Monday, April 6, at noon Pacific Time. 
>>
>>
>> [ ] +1, Accept the donation of the Firefly
>> design as Beam Mascot
>>
>> [ ] -1, Decline the donation of the
>> Firefly design as Beam Mascot
>>
>>
>> Vote is adopted by at least 3 PMC +1
>> approval votes, with no PMC -1 disapproval
>>
>> votes. Non-PMC votes are still encouraged.
>>
>> PMC voters, please help by indicating your
>> vote as "(binding)"
>>
>>
>> The vote and input phase will be open
>> until Monday, April 6, at 12 pm Pacific Time.
>>
>>
>> Thank you very much for your feedback and
>> ideas,
>>
>> Julian
>>
>>
>> [1]
>> 
>> https://docs.google.com/document/d/1zK8Cm8lwZ3ALVFpD1aY7TLCVNwlyTS3PXxTV2qQCAbk/edit?usp=sharing
>>   
>>
>> -- 
>> Julian Bruno // Visual Artist & Graphic
>> Designer
>>  (510) 367-0551  /
>> SF Bay Area, CA
>> www.instagram.com/julbro.art
>> 
>>


Re: [VOTE] Accept the Firefly design donation as Beam Mascot - Deadline Mon April 6

2020-04-03 Thread Jan Lukavský

+1 (non-binding).

On 4/2/20 9:24 PM, Austin Bennett wrote:

+1 (nonbinding)

On Thu, Apr 2, 2020 at 12:10 PM Luke Cwik > wrote:


+1 (binding)

On Thu, Apr 2, 2020 at 11:54 AM Pablo Estrada mailto:pabl...@google.com>> wrote:

+1! (binding)

On Thu, Apr 2, 2020 at 11:19 AM Alex Van Boxel
mailto:a...@vanboxel.be>> wrote:

Thanks for clearing this up Aizhamal.

+1 (non binding)

_/
_/ Alex Van Boxel


On Thu, Apr 2, 2020 at 8:14 PM Aizhamal Nurmamat kyzy
mailto:aizha...@apache.org>> wrote:

Good point, Alex. Actually Julian and I have talked
about producing this kind of guide. It will be
delivered as an additional contribution in the follow
up. We think this will be a derivative of the original
design, and be done after the original is officially
accepted.

With this vote, we want to accept the Firefly donation
as designed [1], and let Julian produce other
artifacts using the official Beam mascot later on.

[1]

https://docs.google.com/document/d/1zK8Cm8lwZ3ALVFpD1aY7TLCVNwlyTS3PXxTV2qQCAbk/edit?usp=sharing


On Thu, Apr 2, 2020 at 10:37 AM Alex Van Boxel
mailto:a...@vanboxel.be>> wrote:

I don't want to be a spoiler... but this vote
feels like a final deliverable... but without a
style guide as Kenn originally suggested most of
use will not be able to adapt the design. This
would include:

  * frontal view
  * side view
  * back view

actually different posses so we can mix and match.
Without this it will never reach the potential of
the Go gopher or gRPC Pancakes.

Note this is *not* a negative vote but I'm afraid
that the use without a guide will be fairly
limited as most of use are not designers. Just a
concern.

 _/
_/ Alex Van Boxel


On Thu, Apr 2, 2020 at 7:27 PM Andrew Pilloud
mailto:apill...@apache.org>>
wrote:

+1, Accept the donation of the Firefly design
as Beam Mascot

On Thu, Apr 2, 2020 at 10:19 AM Julian Bruno
mailto:juliangbr...@gmail.com>> wrote:

Hello Apache Beam Community,

Please vote on the acceptance of the final
design of the Firefly as Beam's mascot
[1]. Please share your input no later than
Monday, April 6, at noon Pacific Time.


[ ] +1, Accept the donation of the Firefly
design as Beam Mascot

[ ] -1, Decline the donation of the
Firefly design as Beam Mascot


Vote is adopted by at least 3 PMC +1
approval votes, with no PMC -1 disapproval

votes. Non-PMC votes are still encouraged.

PMC voters, please help by indicating your
vote as "(binding)"


The vote and input phase will be open
until Monday, April 6, at 12 pm Pacific Time.


Thank you very much for your feedback and
ideas,

Julian


[1]

https://docs.google.com/document/d/1zK8Cm8lwZ3ALVFpD1aY7TLCVNwlyTS3PXxTV2qQCAbk/edit?usp=sharing


-- 
Julian Bruno // Visual Artist & Graphic

Designer
(510) 367-0551  / SF
Bay Area, CA
www.instagram.com/julbro.art