[VOTE] Release 2.54.0, release candidate #2

2024-02-06 Thread Robert Burke via dev
Hi everyone,
Please review and vote on the release candidate #2 for the version 2.54.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


Reviewers are encouraged to test their own use cases with the release
candidate, and vote +1 if
no issues are found. Only PMC member votes will count towards the final
vote, but votes from all
community members is encouraged and helpful for finding regressions; you
can either test your own
use cases [13] or use cases from the validation sheet [10].

The complete staging area is available for your review, which includes:
* GitHub Release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint D20316F712213422 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.54.0-RC2" [5],
* website pull request listing the release [6], the blog post [6], and
publishing the API reference manual [7].
* Python artifacts are deployed along with the source release to the
dist.apache.org [2] and PyPI[8].
* Go artifacts and documentation are available at pkg.go.dev [9]
* Validation sheet with a tab for 2.54.0 release to help with validation
[10].
* Docker images published to Docker Hub [11].
* PR to run tests against release branch [12].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

For guidelines on how to try the release in your projects, check out our RC
testing guide [13].

Thanks,
Robert Burke
Beam 2.54.0 Release Manager

[1] https://github.com/apache/beam/milestone/18?closed=1
[2] https://dist.apache.org/repos/dist/dev/beam/2.54.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1368/
[5] https://github.com/apache/beam/tree/v2.54.0-RC2
[6] https://github.com/apache/beam/pull/30201
[7] https://github.com/apache/beam-site/pull/659
[8] https://pypi.org/project/apache-beam/2.54.0rc2/
[9]
https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.54.0-RC2/go/pkg/beam
[10]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=28763708
[11] https://hub.docker.com/search?q=apache%2Fbeam=image
[12] https://github.com/apache/beam/pull/30104
[13]
https://github.com/apache/beam/blob/master/contributor-docs/rc-testing-guide.md


Re: [Go SDK] Direct Runner Replacement: Prism

2023-02-14 Thread Robert Burke via dev
Here are the next two chunks!

https://github.com/apache/beam/pull/25476 - Coder / element / bytes
handling internally for prism.
https://github.com/apache/beam/pull/25478 - Worker fnAPI handling.

Took a bit to get a baseline of unit testing in for these, since they were
covered by whole pipeline runs.
Coders in particular, since they currently live in the package with the
pipeline tests, so it was harder to ensure
coverage in a vacuum.

But they did force a bit of documentation improvements, and a neglected
inefficiency I had in the original coder structure.

So small pain now, but will make sure future development is a bit easier,
as convenient as "just write a pipeline" is for testing.
Sometimes you just want to ensure the protocol works.

On Thu, Feb 9, 2023 at 2:50 PM Kenneth Knowles  wrote:

> Just a +100 to the idea of this runner. Having an easy-to-read,
> portable-execution, batch & streaming, parallel, local runner, that
> exercises plenty of advanced model features... solid gold!
>
> On Thu, Feb 9, 2023 at 12:01 PM Robert Burke via dev 
> wrote:
>
>> Here are the first of the smaller PRs:
>>
>> https://github.com/apache/beam/pull/25404 -> Adds READMEs and updates
>> go.mod so later changes don't collide there.
>> https://github.com/apache/beam/pull/25405 -> Adds internal/urns package
>> for extracting URNs from the protos.
>> https://github.com/apache/beam/pull/25406 -> Adds internal/config
>> package for parsing and accessing the configuration of variants and
>> handlers in the runner.
>>
>> These are independant changes, and small enough for quicker review. The
>> remaining larger packages can be submitted more piecemeal once these are in.
>>
>>
>>
>> On Wed, Feb 8, 2023 at 3:23 PM Robert Burke  wrote:
>>
>>> Hello Beam!
>>>
>>> == tl;dr; ==
>>>
>>> I wrote a local, portable Beam runner in Go to replace the Go direct
>>> runner.  I'd like to contribute it to the Beam Repo. The Big PR with
>>> everything is here: https://github.com/apache/beam/pull/25391
>>>
>>> I'll be sending smaller PRs out for review to get it into the repo. Take
>>> a look at the big one, don't mind the mess, but do ask questions, or offer
>>> constructive suggestions to make it clearer. There are ample TODOs that
>>> could be added. This thread will be kept up to date with the progress.
>>>
>>> Highlights:
>>> Avoids false positive issues the Go Direct runner has, especially around
>>> serialization issues.
>>> Single transform at a time execution.
>>> Watermark propagation through Graph for GBKs and Side Input windowing.
>>> Will be capable of testing the whole Go SDK, in time.
>>> Will be capable of being a stand alone single binary runner, in time.
>>> ++Many opportunities for contribution after getting into the repo!++
>>>
>>> Lowlights:
>>> Only for Go SDK, for now.
>>> ~~Many unimplemented features~~
>>>
>>> Where to start reading?
>>>
>>> Vision README:
>>> https://github.com/apache/beam/blob/9044f2d4ae151f4222a2f3e0a3264c1198040181/sdks/go/pkg/beam/runners/prism/README.md
>>>
>>>
>>> Code Structure README:
>>> https://github.com/apache/beam/blob/9044f2d4ae151f4222a2f3e0a3264c1198040181/sdks/go/pkg/beam/runners/prism/internal/README.md
>>>
>>>
>>> executePipeline entrypoint:
>>> https://github.com/apache/beam/blob/9044f2d4ae151f4222a2f3e0a3264c1198040181/sdks/go/pkg/beam/runners/prism/internal/execute.go#L41
>>>
>>>
>>>
>>> == The long version ==
>>>
>>> Since last year, I was puttering away at making a Portable Beam Runner
>>> authored in Go. Partly because I wanted to learn the "runner" half of beam,
>>> and partly because the Go Direct Runner (and most other direct runners),
>>> are not good at testing.
>>>
>>> I managed to get it roughly ready for basic batch execution by end of
>>> February 2022 , and then 2022 got away from me. And I couldn't pick it up
>>> until the end of the year.
>>>
>>> I gave a talk about this at Beam Summit 2022
>>> https://2022.beamsummit.org/sessions/portable-go-beam-runner/ that
>>> covers my motivation for it. Loosely, Beam has a Testing Problem. There are
>>> large parts of Beam execution that matter for real world performance and
>>> correctness, but the facilities to test these don't exist.  For example,
>>> take Combiner Lifting, if a combiner is unlifted, but implements
>>> AddInput... then Merge is 

Re: [Go SDK] Direct Runner Replacement: Prism

2023-02-09 Thread Robert Burke via dev
Here are the first of the smaller PRs:

https://github.com/apache/beam/pull/25404 -> Adds READMEs and updates
go.mod so later changes don't collide there.
https://github.com/apache/beam/pull/25405 -> Adds internal/urns package for
extracting URNs from the protos.
https://github.com/apache/beam/pull/25406 -> Adds internal/config package
for parsing and accessing the configuration of variants and handlers in the
runner.

These are independant changes, and small enough for quicker review. The
remaining larger packages can be submitted more piecemeal once these are in.



On Wed, Feb 8, 2023 at 3:23 PM Robert Burke  wrote:

> Hello Beam!
>
> == tl;dr; ==
>
> I wrote a local, portable Beam runner in Go to replace the Go direct
> runner.  I'd like to contribute it to the Beam Repo. The Big PR with
> everything is here: https://github.com/apache/beam/pull/25391
>
> I'll be sending smaller PRs out for review to get it into the repo. Take a
> look at the big one, don't mind the mess, but do ask questions, or offer
> constructive suggestions to make it clearer. There are ample TODOs that
> could be added. This thread will be kept up to date with the progress.
>
> Highlights:
> Avoids false positive issues the Go Direct runner has, especially around
> serialization issues.
> Single transform at a time execution.
> Watermark propagation through Graph for GBKs and Side Input windowing.
> Will be capable of testing the whole Go SDK, in time.
> Will be capable of being a stand alone single binary runner, in time.
> ++Many opportunities for contribution after getting into the repo!++
>
> Lowlights:
> Only for Go SDK, for now.
> ~~Many unimplemented features~~
>
> Where to start reading?
>
> Vision README:
> https://github.com/apache/beam/blob/9044f2d4ae151f4222a2f3e0a3264c1198040181/sdks/go/pkg/beam/runners/prism/README.md
>
>
> Code Structure README:
> https://github.com/apache/beam/blob/9044f2d4ae151f4222a2f3e0a3264c1198040181/sdks/go/pkg/beam/runners/prism/internal/README.md
>
>
> executePipeline entrypoint:
> https://github.com/apache/beam/blob/9044f2d4ae151f4222a2f3e0a3264c1198040181/sdks/go/pkg/beam/runners/prism/internal/execute.go#L41
>
>
>
> == The long version ==
>
> Since last year, I was puttering away at making a Portable Beam Runner
> authored in Go. Partly because I wanted to learn the "runner" half of beam,
> and partly because the Go Direct Runner (and most other direct runners),
> are not good at testing.
>
> I managed to get it roughly ready for basic batch execution by end of
> February 2022 , and then 2022 got away from me. And I couldn't pick it up
> until the end of the year.
>
> I gave a talk about this at Beam Summit 2022
> https://2022.beamsummit.org/sessions/portable-go-beam-runner/ that covers
> my motivation for it. Loosely, Beam has a Testing Problem. There are large
> parts of Beam execution that matter for real world performance and
> correctness, but the facilities to test these don't exist.  For example,
> take Combiner Lifting, if a combiner is unlifted, but implements
> AddInput... then Merge is never called, leaving it untested. And the user
> has no control over this, or may not even be aware of it. How a DoFn is
> executed matters for coverage, and user confidence.  In particular for
> Streaming jobs, users will tend to try things out on their Prod runner, but
> that doesn't help if one is testing on local Flink, but executing on Google
> Cloud Dataflow, which behave very differently.
>
> Regardless of whether you agree with that thesis...  I wanted to fill that
> gap. I wanted a runner that could be configured to test those situations,
> and in particular, make it easier to develop SDKs and all the features of
> Beam that don't get their own blog posts.
>
> Especially for the Go SDK. Java, being the oldest, has arguably the only
> "correct" beam runner, in the form of the Java Direct Runner. But one can't
> execute Go pipelines on that. Python has a portable execution of its
> runner, but the current state of python is Parallelism hostile at best. It
> supports a great many things, like Cross Language, but can't support
> streaming execution (ProcessContinations etc) at present. Also, being a
> large Python program, it's harder to follow.  The Java Direct runner, while
> being slightly easier to follow, doesn't have a clear execution flow.
> Neither of them are particularly easy for Non Language Experts to stand up
> and use, especially outside of the Beam repo.
>
> The Go SDK's Direct Runner has many flaws, most of which are due to Direct
> execution, rather than Portable Execution.  Implementing features largely
> meant hacking certain things in, so they would be able to be executed. This
> also made supporting and testing Cross Language Transforms, State and
> Timers in Go pipelines a non-starter for users. And that's just the tip.
>
> So I wanted something better. I mentioned it a few times to others, but I
> kept hearing the same refrain: "I want something that does that". Or 

[RESULT] [VOTE] Release 2.42.0, release candidate #2

2022-10-17 Thread Robert Burke via dev
I'm happy to announce that we have unanimously approved this release.

There are 9 approving votes, 6 of which are binding:
* Pablo Estrada
* Robert Bradshaw
* Ahmet Altay
* Alexey Romanenko
* Chammikara Jayalath
* Jean-Baptiste Onofré

There are no disapproving votes

Thanks everyone!
Robert Burke


[VOTE] Release 2.42.0, release candidate #1

2022-09-29 Thread Robert Burke via dev
Hi everyone,
Please review and vote on the release candidate #1 for the version 2.42.0,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

Reviewers are encouraged to test their own use cases with the release
candidate, and vote +1 if no issues are found.

The complete staging area is available for your review, which includes:
* GitHub Release notes [1],
* the official Apache source release to be deployed to dist.apache.org [2],
which is signed with the key with fingerprint
A52F5C83BAE26160120EC25F3D56ACFBFB2975E1 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "v2.42.0-RC1" [5],
* website pull request listing the release [6], the blog post [6], and
publishing the API reference manual [7].
* Java artifacts were built with Gradle GRADLE_VERSION and OpenJDK/Oracle
JDK JDK_VERSION.
* Python artifacts are deployed along with the source release to the
dist.apache.org [2] and PyPI [8]
* Go Package information and SDK RC  [9]
* Validation sheet with a tab for 2.42.0 release to help with validation
[10].
* Docker images published to Docker Hub [11].

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

For guidelines on how to try the release in your projects, check out our
blog post at https://beam.apache.org/blog/validate-beam-release/.

Thanks,
Robert Burke
2.42.0 Release Manager

[1] https://github.com/apache/beam/milestone/4
[2] https://dist.apache.org/repos/dist/dev/beam/2.42.0/
[3] https://dist.apache.org/repos/dist/release/beam/KEYS
[4] https://repository.apache.org/content/repositories/orgapachebeam-1285/
[5] https://github.com/apache/beam/tree/v2.42.0-RC1
[6] https://github.com/apache/beam/pull/23406
[7] https://github.com/apache/beam-site/pull/634
[8] https://pypi.org/project/apache-beam/2.42.0rc1/
[9]
https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.42.0-RC1/go/pkg/beam
[10]
https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=265602293
[11] https://hub.docker.com/search?q=apache%2Fbeam=image


Re: Upcoming potentially breaking change to CoGroupByKey

2022-09-06 Thread Robert Burke via dev
"new release" is an ambiguous descriptor in this email. I'm going to
continue to take it as "right after the 2.42.0 cut, to make it into the
2.43.0 as planned".

On Tue, Sep 6, 2022 at 2:39 PM Ryan Thompson 
wrote:

> There was discussion in the python meeting to try to get this into the new
> release. The consensus was that putting it in right after the release had
> the highest chance of catching problems with the least amount of pain.
>
> I can send it out to users next week if nother halts the change.
>
> On Tue, Sep 6, 2022 at 5:03 PM Luke Cwik  wrote:
>
>> We should send this out to us...@beam.apache.org so that they are aware
>> of this change once commenting in the doc has settled.
>>
>> On Tue, Sep 6, 2022 at 1:59 PM Robert Burke  wrote:
>>
>>> Thank you for already planning to *NOT* have this merged until after
>>> this week's 2.42.0 cut. This Release Manager is pleased that the doc says
>>> it's intended for 2.43.0.
>>>
>>>
>>> On Tue, Sep 6, 2022, 1:44 PM Ryan Thompson via dev 
>>> wrote:
>>>
 CoGroupByKey returns a dictionary of {KeyType, List[ValueType]} but, as
 GroupByKey, should return an Iterable.

 Change:
 https://github.com/apache/beam/pull/22984

 Please look at this doc
 
 if you need more details. Feel free to comment.