Re: PCollection#applyWindowingStrategyInternal

2024-04-22 Thread Kenneth Knowles
notion of Sinks as a special object, so to allow for output >>> specification it has to be on the ParDo), and the triggers should propagate >>> up the graph back to the source. This is in contrast to today where we >>> attach triggering to the windowing information.

Re: PCollection#applyWindowingStrategyInternal

2024-04-09 Thread Kenneth Knowles
At a top level `setWindowingStrategyInternal` exists to set up the metadata without actually assigning windows. If we were more clever we might have found a way for it to not be public... it is something that can easily lead to an invalid pipeline. I think "compatible windows" today in Beam

Re: [VOTE] Patch Release 2.55.1, release candidate #2

2024-04-03 Thread Kenneth Knowles
+1 (binding) Kenn On Wed, Apr 3, 2024 at 12:58 PM Danny McCormick via dev wrote: > > Also noting that there is no PR postsubmit test suite running against > the release branch in the vote email. Given the diff, that's also fine > since previous tests runs didn't detect the breakage, but in

Re: Supporting Dynamic Destinations in a portable context

2024-04-03 Thread Kenneth Knowles
xpress, and will be asked to extend this little templating in more > directions. To head that off - could we easily just reuse an existing > language (SQL, LUA, something of the form?) instead of creating something > new? > > On Tue, Apr 2, 2024 at 8:55 AM Kenneth Knowles wrote: > >> I real

Re: Supporting Dynamic Destinations in a portable context

2024-04-02 Thread Kenneth Knowles
I really like this proposal. I think it has narrowed down and solved the essential problem of not shuffling excess redundant data, and also provides the vast majority of the functionality that a lambda would, with significantly better debugability and usability too, since the dynamic destination

Re: [ACTION REQUESTED] Help me draft the Beam Board Report for March 2024

2024-03-13 Thread Kenneth Knowles
Thanks! I've submitted the report earlier today. Kenn On Mon, Mar 11, 2024 at 6:08 PM XQ Hu wrote: > Thanks for the ping! I added several notes and feel free to make more > changes. > > On Mon, Mar 11, 2024 at 2:49 PM Kenneth Knowles wrote: > >> Ping! >> >>

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-03-13 Thread Kenneth Knowles
Closing the loop, I went with two URNs and an associated payload in https://github.com/apache/beam/pull/30545 Kenn On Wed, Mar 6, 2024 at 10:54 AM Kenneth Knowles wrote: > OK of course hacking this up there's already combinatorial 2x2 that > perhaps people were alluding to but I

Re: [ACTION REQUESTED] Help me draft the Beam Board Report for March 2024

2024-03-11 Thread Kenneth Knowles
Ping! Would really love help from folks building stuff to report out on what they've built, especially! Kenn On Tue, Mar 5, 2024 at 12:15 PM Kenneth Knowles wrote: > The next Beam board report is due next Wednesday, March 13. Please draft > it together at https://s.apache.org/beam

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-03-06 Thread Kenneth Knowles
e_allowing_duplicates" instead of building from the unspecified > Reshuffle semantics. > > Transforms getting updated to use the new transform can have their > @RequiresStableInputs annotation added accordingly if they need that > property per previous discussions. > > > >

[ACTION REQUESTED] Help me draft the Beam Board Report for March 2024

2024-03-05 Thread Kenneth Knowles
The next Beam board report is due next Wednesday, March 13. Please draft it together at https://s.apache.org/beam-draft-report-2024-03. The doc is open for anyone to edit. Ideas: - highlights from CHANGES.md - interesting technical discussions - integrations with other projects - community

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-27 Thread Kenneth Knowles
treaming case). > > Is there anything missing in such definition that would still require > splitting the timers into two distinct features? > > Jan > On 2/26/24 21:22, Kenneth Knowles wrote: > > Yea I like DelayTimer, or SleepTimer, or WaitTimer or some such. > >

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-26 Thread Kenneth Knowles
lem. We'd want to break down exactly what different and the same for > the 3 kinds of timers... > > > > > On Mon, Feb 26, 2024, 11:45 AM Kenneth Knowles wrote: > >> Pulling out focus points: >> >> On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev < >>

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-26 Thread Kenneth Knowles
ingle-threaded, thus any timer has to > > > > wait before any element processing finishes. This is only > consequence of > > > > a technical solution, not something fundamental. > > > > > > > > Having said that, my point is that according to the abov

[DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-22 Thread Kenneth Knowles
Forking this thread. The state of processing time timers in this mode of processing is not satisfactory and is discussed a lot but we should make everything explicit. Currently, a state and timer DoFn has a number of logical watermarks: (apologies for fixed width not coming through in email

Re: Throttle PTransform

2024-02-22 Thread Kenneth Knowles
Wow I love your input Reuven. Of course "the source" that you are applying backpressure to is often a runner's shuffle so it may be state anyhow, but it is good to give the runner the choice of how to figure that out and maybe chain backpressure further. The goal is basically to make a sink that

Re: [PROPOSAL] Preparing for 2.55.0 Release

2024-02-22 Thread Kenneth Knowles
Hooray! Thank you! On Thu, Feb 22, 2024 at 10:24 AM Yi Hu via dev wrote: > Hey Beam community, > > The next release (2.55.0) branch cut is scheduled on Mar 6th, 2024, > according to > the release calendar [1]. > > I volunteer to perform this release. My plan is to cut the branch on that > date,

Re: Pipeline upgrade to 2.55.0-SNAPSHOT broken for FlinkRunner

2024-02-22 Thread Kenneth Knowles
ons). > > I created [2] and marked it as blocker for 2.55.0 release, because > otherwise we would break the upgrade. > > Thanks for the discussion, it helped a lot. > > Jan > > [1] https://github.com/apache/beam/pull/30197 > > [2] https://github.com/apache/beam/issues/3

Re: Pipeline upgrade to 2.55.0-SNAPSHOT broken for FlinkRunner

2024-02-21 Thread Kenneth Knowles
Yea I think we should restore the necessary classes but also fix the FlinkRunner. Java serialization is inherently self-update-incompatible. On Wed, Feb 21, 2024 at 1:35 PM Reuven Lax via dev wrote: > Is there a fundamental reason we serialize java classes into Flink > savepoints. > > On Wed,

Re: [API PROPOSAL] PTransform.getURN, toProto, etc, for Java

2024-02-16 Thread Kenneth Knowles
ation for it. More so in Java where such handler > registrations can be done via class annotations! > > Robert Burke > Beam Go Busybody > > On Thu, Feb 15, 2024, 10:37 AM Robert Bradshaw via dev < > dev@beam.apache.org> wrote: > >> On Wed, Feb 14, 2024 at 10:

[API PROPOSAL] PTransform.getURN, toProto, etc, for Java

2024-02-14 Thread Kenneth Knowles
Hi all, TL;DR I want to add some API like PTransform.getURN, toProto and fromProto, etc. to the Java SDK. I want to do this so that making a PTransform support portability is a natural part of writing the transform and not a totally separate thing with tons of boilerplate. What do you think? I

Re: [VOTE] Vendored Dependencies Release

2024-02-14 Thread Kenneth Knowles
+1 (binding) On Wed, Feb 14, 2024 at 10:48 AM Robert Burke wrote: > +1 (binding) > > On Wed, Feb 14, 2024, 7:35 AM Yi Hu via dev wrote: > >> +1 (non-binding) >> >> checked artifact packages not leaking namespace (or under >> org.apache.beam.vendor.grpc.v1p60p1) and the tests in >>

[ANNOUNCE] New Committer: Svetak Sundhar

2024-02-12 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming a new committer: Svetak Sundhar (sve...@apache.org). Svetak has been with Beam since 2021. Svetak has contributed code to many areas of Beam, including notebooks, Beam Quest, dataframes, and IOs. We also want to especially

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-02-08 Thread Kenneth Knowles
about what that means for > checkpointing/durability behavior, but that's largely been runner dependent > anyway. I admit the above definition is biased by the uses of Reshuffle I'm > aware of, which largely are to incur a fusion break in the execution graph. > > Robert Burke &g

Re: [PROPOSAL] Re-release vendor grpc

2024-02-06 Thread Kenneth Knowles
SGTM. Thanks for doing this! On Tue, Feb 6, 2024 at 5:20 PM Sam Whittle wrote: > Hi everyone, > > I would like to volunteer to rerelease the Beam vendored grpc 1.60.1. > The grpc version will be unchanged but additional jars > 'io.grpc:grpc-services' and 'io.grpc:grpc-util' will be added due to

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-01-31 Thread Kenneth Knowles
ify a real deviation from it. > > I'm all for more specific behaviors if means we actually clarify what the > original version is in the protos, since its news to me ( just now, because > I looked) that the Java reshuffle promises GBK-like side effects. But > that's a long deprecated

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-01-31 Thread Kenneth Knowles
t; I looked) that the Java reshuffle promises GBK-like side effects. But > that's a long deprecated transform without a satisfying replacement for > it's usage, so it may be moot. > > Robert Burke > > > > On Tue, Jan 30, 2024, 1:34 PM Kenneth Knowles wrote: > >> Hi a

[DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-01-30 Thread Kenneth Knowles
Hi all, Just when you thought I had squeezed all the possible interest out of this most boring-seeming of transforms :-) I wrote up a very quick proposal as a doc [1]. It is short enough that I will also put the main idea and main question in this email so you can quickly read. Best to put

Re: [VOTE] Vendored Dependencies Release

2024-01-22 Thread Kenneth Knowles
github.com/apache/beam/tree/master/contributor-docs >> >> [1] https://s.apache.org/beam-release-vendored-artifacts >> >> On Thu, Jan 18, 2024 at 2:56 PM Robert Bradshaw via dev < >> dev@beam.apache.org> wrote: >> >>> Could you explain the process yo

Re: @RequiresTimeSortedInput adoption by runners

2024-01-19 Thread Kenneth Knowles
In this design space, what we have done in the past is: 1) ensure that runners all reject pipelines they cannot run correctly 2) if there is a default/workaround/slower implementation, provide it as an override This is largely ignoring portability but I think/hope it will still work. At one time

Re: [VOTE] Vendored Dependencies Release

2024-01-18 Thread Kenneth Knowles
+1 On Wed, Jan 17, 2024 at 6:03 PM Yi Hu via dev wrote: > Hi everyone, > > > Please review the release of the following artifacts that we vendor: > > * beam-vendor-grpc-1_60_1 > > > Please review and vote on the release candidate #1 for the version 0.1, as > follows: > > [ ] +1, Approve the

Re: [PROPOSAL] Upgrade vendor grpc

2024-01-12 Thread Kenneth Knowles
Yes, thank you! On Thu, Jan 11, 2024 at 8:21 PM Chamikara Jayalath via dev < dev@beam.apache.org> wrote: > Sounds good and thanks for doing this :) > > - Cham > > On Thu, Jan 11, 2024 at 8:06 AM Yi Hu via dev wrote: > >> Hi everyone, >> >> I would like to volunteer to upgrade the Beam vendored

Re: ByteBuddy DoFnInvokers Write Up

2024-01-12 Thread Kenneth Knowles
This is really great, and a very good idea to document. Going from "what does a DoFnSignature and DoFnInvoker look like for a particular DoFn" is super useful to even explain why these constructions exist. And from there, you can talk about what the bytecode looks like and what the ByteBuddy to

[ACTION REQUESTED] Help me draft the Beam Board Report for January 2024

2024-01-05 Thread Kenneth Knowles
Hi all, The next Beam board report is due next Wednesday, January 10. Please help me to draft it at https://s.apache.org/beam-draft-report-2024-01. The doc is open for anyone to edit. Ideas: - highlights from CHANGES.md - interesting technical discussions - integrations with other projects

Re: Credentials Rotation Failure on Metrics cluster (2023-11-01)

2023-11-01 Thread Kenneth Knowles via dev
+Danny McCormick is this the converse of the other failure? (I didn't click through I just read the other thread) On Tue, Oct 31, 2023 at 10:10 PM gacti...@beam.apache.org < beamacti...@gmail.com> wrote: > Something went wrong during the automatic credentials rotation for Metrics > Cluster,

Re: [YAML] Aggregations

2023-10-30 Thread Kenneth Knowles
on of demos. > > > On Mon, Oct 23, 2023, 7:00 AM XQ Hu via dev wrote: > >> +1 on your proposal. >> >> On Fri, Oct 20, 2023 at 4:59 PM Robert Bradshaw via dev < >> dev@beam.apache.org> wrote: >> >>> On Fri, Oct 20, 2023 at 11:35 AM Kenneth Know

Re: Streaming update compatibility

2023-10-30 Thread Kenneth Knowles
+1 million to this. I think this could be a real game-changer. I would even more forcefully say update compatibility has pushed our development style has been pushed into the "never make significant changes" or "every significant change is wildly more complex than it should be". It forces our

Re: [Discuss] Idea to increase RC voting participation

2023-10-25 Thread Kenneth Knowles
> automation to be more tolerant to failures), but it doesn't seem super > urgent to me (feel free to disagree). I don't think this piece needs to be > perfect. > > On Tue, Oct 24, 2023 at 2:40 PM Kenneth Knowles wrote: > >> Just grabbing one at random for an example, >> https://g

Re: [Discuss] Idea to increase RC voting participation

2023-10-24 Thread Kenneth Knowles
us much value beyond what we have today. > > On Tue, Oct 24, 2023 at 1:54 PM Robert Bradshaw via dev < > dev@beam.apache.org> wrote: > >> On Tue, Oct 24, 2023 at 10:35 AM Kenneth Knowles wrote: >> >>> Tangentially related: >>> >>> Long ago, a

Re: [Discuss] Idea to increase RC voting participation

2023-10-24 Thread Kenneth Knowles
Tangentially related: Long ago, attaching an issue to a release was a mandatory step as part of closing. Now I think it is not. Is it automatically happening? It looks like we have 820 with no milestone https://github.com/apache/beam/issues?q=is%3Aissue+no%3Amilestone+is%3Aclosed Kenn On Tue,

Re: [YAML] Aggregations

2023-10-20 Thread Kenneth Knowles
A couple other bits on having an expression language: - You already have Python lambdas at places, right? so that's quite a lot more complex than SQL project/aggregate expressions - It really does save a lot of pain for users (at the cost of implementation complexity) when you need to

Re: Reshuffle PTransform Design Doc

2023-10-20 Thread Kenneth Knowles
a Friday discussion :-) Kenn Best, > > Jan > > On 10/19/23 20:26, Kenneth Knowles wrote: > > Well I accidentally conflated "stateful" and "persisting", but anyhow > > yea we aren't targeting to have one Beam primitive for each thing that > &

Re: [NOTICE] Deprecation Avro classes in "core" and use "extensions/avro" instead for Java SDK

2023-10-19 Thread Kenneth Knowles
W On Wed, Oct 18, 2023 at 4:19 PM Byron Ellis via dev wrote: > Awesome! > > On Wed, Oct 18, 2023 at 1:14 PM Alexey Romanenko > wrote: > >> Heads up! >> >> Finally, all Avro-related code and Avro dependency, that was deprecated >> before (see a message above), has been removed from Beam

Re: [Discuss] Idea to increase RC voting participation

2023-10-19 Thread Kenneth Knowles
+1 to more helpful guide on "how to usefully participate in RC validation" but also big +1 to Robert, Jack, Johanna. TL;DR the RC validation is an opportunity for downstream testing. Robert alluded to the origin of the spreadsheet: I created it long ago to validate that the human language on our

Re: [DISCUSS] Drop Euphoria extension

2023-10-19 Thread Kenneth Knowles
Makes sense to me. Let's deprecate for the 2.52.0 release unless there is some objection. You can also look at the maven central downloads (I believe all PMC and maybe all committers can view this) compared to other Beam jars. Kenn On Mon, Oct 16, 2023 at 9:28 AM Jan Lukavský wrote: > Sure,

Re: Reshuffle PTransform Design Doc

2023-10-19 Thread Kenneth Knowles
Well I accidentally conflated "stateful" and "persisting", but anyhow yea we aren't targeting to have one Beam primitive for each thing that is probably a runner primitive. On Thu, Oct 19, 2023 at 2:25 PM Kenneth Knowles wrote: > > On Fri, Oct 13, 2023 at 12:51 PM Jan L

Re: Reshuffle PTransform Design Doc

2023-10-19 Thread Kenneth Knowles
It is more like SQL in that it is a library for building composites that eventually are constructed from fundamental operations on data, that every engine (like every RDBMS) will be able to implement in its own way. Kenn > > Therefore here goes the question - should Redistribute be a primitiv

Re: [YAML] Aggregations

2023-10-19 Thread Kenneth Knowles
Using SQL expressions in strings is maybe OK given we are all relational all the time. Either way you have to define what the universe of `fn` is. Here's a compact possibility: type: Combine config: group_by: [field1, field2] aggregates: max_cost: "MAX(cost)" total_cost: "SUM(cost)"

[ANNOUNCE] Apache Beam 2.51.0 Released

2023-10-18 Thread Kenneth Knowles
The Apache Beam Team is pleased to announce the release of version 2.51.0. You can download the release here: https://beam.apache.org/get-started/downloads/ This release includes bug fixes, features, and improvements detailed on the Beam Blog: https://beam.apache.org/blog/beam-2.51.0/ and the

[ANNOUNCE] New Committer: Byron Ellis

2023-10-16 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming a new committer: Byron Ellis (b...@apache.org). Byron has been with Beam for over a year now. You may all know him as the guy who just decided to write a Swift SDK :-). In addition to that big contribution Byron has also fixed

[ANNOUNCE] New Committer: Sam Whittle

2023-10-16 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming a new committer: Sam Whittle (scwhit...@apache.org). Sam has been contributing to Beam since 2016! In particular, he specializes in streaming and the Dataflow Java worker but his contributions expand naturally from there to the

Re: [GitHub Actions] Requiring a test but not running it until requested

2023-10-11 Thread Kenneth Knowles
). > > Outside of the feasibility question, I'm at least theoretically > interested. This could allow us to turn some of our postcommits into > precommits without burning too much CI compute. I'm also generally +1 on > requiring more checks to pass before merging, especially if we c

[RESULT] [VOTE] Release 2.51.0, release candidate #1

2023-10-11 Thread Kenneth Knowles
The vote has passed. There are 5 +1 binding votes: - Robert Bradshaw - Jan Lukavský - Ahmet Altay - Jean-Baptiste Onofré - Alexey Romanenko Additionally there are 5 non-binding +1 votes: - Danny McCormick - Svetak Sundhar - XQ Hu - Bruno Volpato - Yi Hu There are no disapproving

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-11 Thread Kenneth Knowles
OK I'm ready. +1 (binding) On Tue, Oct 10, 2023 at 4:30 PM Ahmet Altay via dev wrote: > Thank you for the information. > > I agree with Kenn in that case. This could wait for the next release. > Unless there is another reason to do the RC2. > > On Tue, Oct 10, 2023 at 12:30 PM Yi Hu wrote: >

[GitHub Actions] Requiring a test but not running it until requested

2023-10-11 Thread Kenneth Knowles
>From our other thread I had a thought about our "only on request" tests. Today, in theory: - The lightweight tests run automatically based on path matching. This is an approximate implementation of the ideal of running based on whether they could impact a test signal. - Heavyweight (and more

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-11 Thread Kenneth Knowles
So, top-posting because the threading got to be a lot for me and I think it forked a bit too... I may even be restating something someone said, so apologies for that. Very very good point about *required* parameters where if you don't use them then you will end up with two writers writing to the

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-10 Thread Kenneth Knowles
if they are all benign before closing the vote. If someone wants to actually -1 the RC they can do that, but I won't (yet). Kenn On Mon, Oct 9, 2023 at 4:22 PM Kenneth Knowles wrote: > OK I can cherrypick it so they have an upgrade fix. But also we should > instruct users to pin their fastavro v

Re: [PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Kenneth Knowles
h `go test` for example) or doing this in situations where it is >>>> unnatural. >>>> >>>> My example probably confused this because I left off the `./gradlew` >>>> just to save space. I'm proposing naming them after their obvious repro >>&

Re: [PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Kenneth Knowles
his pattern, but there >> are also many outliers. A good start could be to clean up all jobs >> to follow the same pattern. >> >> >> On Tue, Oct 10, 2023 at 9:57 AM Kenneth Knowles wrote: >> >>> FWIW I aware of the README in >>> https://g

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Kenneth Knowles
ated files they have to work with as-is, where they probably didn't plan for this way of thinking when they were gathering the files. We need to give good options for everyone, but the golden path should be the simple and good case. Kenn On Tue, Oct 10, 2023 at 10:09 AM Kenneth Knowles wrote: &g

Re: [YAML] Fileio sink parameterization (streaming, sharding, and naming)

2023-10-10 Thread Kenneth Knowles
Since I've been in GHA files lately... I think they have a very useful pattern which we could borrow from or learn from, where setting up the variables happens separately, like

Re: [PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Kenneth Knowles
FWIW I aware of the README in https://github.com/apache/beam/tree/master/.test-infra/jenkins that lists the phrases alongside the jobs. This is just wasted work to maintain IMO. Kenn On Tue, Oct 10, 2023 at 9:46 AM Kenneth Knowles wrote: > *Proposal:* make all the job names exactly ma

[PROPOSAL] [Nice-to-have] CI job names and commands that match

2023-10-10 Thread Kenneth Knowles
*Proposal:* make all the job names exactly match the GH comment to run them and make it also as close as possible to how to reproduce locally *Example problems*: - We have really silly redundant jobs results like 'Chicago Taxi Example on Dataflow ("Run Chicago Taxi on Dataflow")' and

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-09 Thread Kenneth Knowles
; On Mon, Oct 9, 2023 at 4:08 PM Kenneth Knowles wrote: > >> If we had closed the release today, this would still have broken all our >> users, correct? >> >> Kenn >> >> On Mon, Oct 9, 2023 at 3:37 PM Anand Inguva via dev >> wrote: >> &

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-09 Thread Kenneth Knowles
a12597fec5b4a2c/sdks/python/setup.py#L245 > > > On Mon, Oct 9, 2023 at 3:15 PM Kenneth Knowles wrote: > >> Ran a couple of Java pipelines "as a newb user" to make sure our >> instructions weren't out of date. There are some errors in the instructions >> but th

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-09 Thread Kenneth Knowles
owTemplates/tree/56d18a31c1c95e58543d7a1656bd83d7e859b482/it) > BigQueryIO, TextIO, BigtableIO, SpannerIO on Dataflow legacy runner and > runner v2 > > > On Fri, Oct 6, 2023 at 3:23 PM Kenneth Knowles wrote: > >> Additionally we need https://github.com/apache/beam/pull/28665/files in >&g

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-06 Thread Kenneth Knowles
Additionally we need https://github.com/apache/beam/pull/28665/files in order to run GHA tests. On Fri, Oct 6, 2023 at 3:19 PM Kenneth Knowles wrote: > That PR was prior to many cherry-picks so it is not the signal we need. I > have updated it to the tip of the release-2.51.0

Re: [VOTE] Release 2.51.0, release candidate #1

2023-10-06 Thread Kenneth Knowles
Romanenko > wrote: > >> +1 (binding) >> >> — >> Alexey >> >> > On 5 Oct 2023, at 18:38, Jean-Baptiste Onofré wrote: >> > >> > +1 (binding) >> > >> > Thanks ! >> > Regards >> > JB >> > >> >

Re: Reshuffle PTransform Design Doc

2023-10-06 Thread Kenneth Knowles
On Fri, Oct 6, 2023 at 3:07 PM Jan Lukavský wrote: > > On 10/6/23 15:11, Kenneth Knowles wrote: > > > > On Fri, Oct 6, 2023 at 3:20 AM Jan Lukavský wrote: > >> Hi, >> >> there is also one other thing to mention with relation to >> Reshuffle/Req

Re: Reshuffle PTransform Design Doc

2023-10-06 Thread Kenneth Knowles
which may be more expensive than what the > runner is able to do with that awareness. > > Aka: it gives purpose to the fallback implementations. > > On Thu, Oct 5, 2023, 9:03 AM Kenneth Knowles wrote: > >> Another perspective, ignoring runners custom implementations and non-Ja

Re: Reshuffle PTransform Design Doc

2023-10-05 Thread Kenneth Knowles
not use it (and also we shouldn't use "whatever the implementation does" as a spec for anything we care about). On Thu, Oct 5, 2023 at 11:56 AM Kenneth Knowles wrote: > I totally agree. I am motivated right now by the fact that it is already > used all over the place but with no cons

Re: Reshuffle PTransform Design Doc

2023-10-05 Thread Kenneth Knowles
not do anything with is odd. > > On Thu, Oct 5, 2023 at 11:30 AM Kenneth Knowles wrote: > >> So a high level suggestion from Robert that I want to highlight as a >> top-post: >> >> Instead of focusing on just fixing the SDKs and runners Reshuffle, this >> c

Re: Reshuffle PTransform Design Doc

2023-10-05 Thread Kenneth Knowles
of Reshuffle from redistribution-only uses of Reshuffle. Any other thoughts on this one high level bit? Kenn On Thu, Oct 5, 2023 at 11:15 AM Kenneth Knowles wrote: > > On Wed, Oct 4, 2023 at 7:45 PM Robert Burke wrote: > >> LGTM. >> >> It looks the Go SDK already adheres

Re: Reshuffle PTransform Design Doc

2023-10-05 Thread Kenneth Knowles
> https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/handlerunner.go#L82 > > > > On 2023/09/26 15:43:53 Kenneth Knowles wrote: > > Hi everyone, > > > > Recently there was a bug [1] caused by discrepancies between two of

[ANNOUNCE] New PMC Member: Valentyn Tymofieiev

2023-10-03 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming Valentyn Tymofieiev as our newest PMC member. Valentyn has been contributing to Beam since 2017. Notable highlights include his work on the Python SDK and also in our container management. Valentyn also is involved in many

[ANNOUNCE] New PMC Member: Robert Burke

2023-10-03 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming Robert Burke < lostl...@apache.org> as our newest PMC member. Robert has been a part of the Beam community since 2017. He is our resident Gopher, producing the Go SDK and most recently the local, portable, Prism runner. Robert has

[ANNOUNCE] New PMC Member: Alex Van Boxel

2023-10-03 Thread Kenneth Knowles
Hi all, Please join me and the rest of the Beam PMC in welcoming Alex Van Boxel < alexvanbo...@apache.org> as our newest PMC member. Alex has been with Beam since 2016, very early in the life of the project. Alex has contributed code, design ideas, and perhaps most importantly been a huge part

[VOTE] Release 2.51.0, release candidate #1

2023-10-03 Thread Kenneth Knowles
Hi everyone, Please review and vote on the release candidate #1 for the version 2.51.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) Reviewers are encouraged to test their own use cases with the release candidate, and vote +1 if

Re: [LAZY CONSENSUS] Create separate repository for Swift SDK

2023-09-29 Thread Kenneth Knowles
Hi all, Thanks for your "approval" :-) I have created https://github.com/apache/beam-swift Kenn On Mon, Sep 25, 2023 at 1:01 PM Valentyn Tymofieiev via dev < dev@beam.apache.org> wrote: > On Mon, Sep 25, 2023 at 9:03 AM Kenneth Knowles wrote: > >> Hi all, >&g

Re: Runner Bundling Strategies

2023-09-27 Thread Kenneth Knowles
ization. Kenn > > On the other hand, if there's a Start but no Finish we could safely > truncate (and retry) the outputs at any point and still get a > valid-under-the-model result, which could play well with the checkpointing > model of persistence. This could possibly allow for opti

Re: Runner Bundling Strategies

2023-09-26 Thread Kenneth Knowles
wever it wants if there's no @FinishBundle. I think that's what Jan is getting at - adding a @FinishBundle is the user placing a new restriction on the runner. Technically probably have to include @StartBundle in that consideration. Kenn > > On Tue, Sep 26, 2023 at 8:54 AM Kenneth Knowles wrote:

Re: Runner Bundling Strategies

2023-09-26 Thread Kenneth Knowles
o put this in >>> explicitly managed runner state to allow for cross-bundle amortization and >>> there's more value in distinguishing between @Setup and @StartBundle. >>> >>> (Were I do to things over I'd probably encourage an API that discouraged >&

Reshuffle PTransform Design Doc

2023-09-26 Thread Kenneth Knowles
Hi everyone, Recently there was a bug [1] caused by discrepancies between two of Dataflow's reshuffle implementations. I think the reference implementation in the Java SDK [2] also does not match. This all led to discussion on the bug and the pull request [3] about what the actual semantics

[LAZY CONSENSUS] Create separate repository for Swift SDK

2023-09-25 Thread Kenneth Knowles
Hi all, I propose to unblock Byron's work by creating a new repository for the Beam Swift SDK. This will be the first of its kind, and break from tradition of having Beam be kind of a mini-mono-repo. Discussion of the Swift SDK and request for a separate repo is at

Re: Runner Bundling Strategies

2023-09-25 Thread Kenneth Knowles
ne would have something like >> >> ParDo(X) >> >> which would logically (though not necessarily physically) lead to an >> execution like >> >> with X.bundle_processor() as bundle_processor: >> for bundle in bundles: >> with bundle_pro

Re: Runner Bundling Strategies

2023-09-22 Thread Kenneth Knowles
(I notice that you replied only to yourself, but there has been a whole thread of discussion on this - are you subscribed to dev@beam? https://lists.apache.org/thread/k81fq301ypwmjowknzyqq2qc63844rbd) It sounds like you want what everyone wants: to have the biggest bundles possible. So for

Re: Runner Bundling Strategies

2023-09-22 Thread Kenneth Knowles
What is the best way to amortize heavy operations across elements in Flink? (that is what bundles are for, basically) On Fri, Sep 22, 2023 at 5:09 AM Jan Lukavský wrote: > Flink defines bundles in terms of number of elements and processing time, > by default 1000 elements or 1000 milliseconds,

Re: User-facing website vs. contributor-facing website

2023-09-21 Thread Kenneth Knowles
, lightly rendered like GH does, or fully rendered with navs. Yes, I am describing g3doc (which is talked about publicly so I can name it, but I don't know what the publicly-available equivalent is). None of the website-building not-human-readable stuff from jekyll and hugo. Kenn > > On Thu,

Re: User-facing website vs. contributor-facing website

2023-09-21 Thread Kenneth Knowles
main page ( >>> https://beam.apache.org/contribute/) and the link to CONTRIBUTING.md >>> <https://github.com/apache/beam/blob/master/CONTRIBUTING.md> makes more >>> sense on the wiki (we can keep the section with the sidebar links just >>> redirec

User-facing website vs. contributor-facing website

2023-09-21 Thread Kenneth Knowles
Hello! I am reviving a discussion that began at https://lists.apache.org/thread/w4g8xpg4215nlq86hxbd6n3q7jfnylny when we started our Confluence wiki and has even been revived once before. The conclusion of that thread was basically "yes, let us separate the contributor-facing stuff to a

Re: [PROPOSAL] Preparing for 2.51.0 Release

2023-09-20 Thread Kenneth Knowles
Update: the release branch has been cut On Thu, Sep 14, 2023 at 4:49 AM Jean-Baptiste Onofré wrote: > Awesome ! Thanks Kenn ! > > Regards > JB > > On Thu, Sep 14, 2023 at 3:20 AM Kenneth Knowles wrote: > > > > Hello Beam community! > > > > The nex

Re: Stateful Beam Job with Flink Runner - Checkpoint Size Increasing Over Time

2023-09-19 Thread Kenneth Knowles
Caveat: it has been a long time and I don't really know the details of the FlinkRunner. But I can answer a couple questions. On Fri, Sep 15, 2023 at 7:07 PM Hemant Kumar via dev wrote: > Hi Team, > > I am facing an issue of running a beam stateful job on flink, > > *Problem Statement:* >

[PROPOSAL] Preparing for 2.51.0 Release

2023-09-13 Thread Kenneth Knowles
Hello Beam community! The next release (2.51.0) branch cut is scheduled for September 20, 2023, one week from today, according to the release calendar [1]. I'd like to volunteer to perform this release. My plan is to cut the branch on that date, and cherrypick release-blocking fixes afterwards,

Re: Different Beam project launched

2023-09-13 Thread Kenneth Knowles
Thanks for bringing it up. We did the standard ASF process around name collisions a few months ago. Kenn On Wed, Sep 13, 2023 at 2:46 PM Kerry Donny-Clark via dev < dev@beam.apache.org> wrote: > https://github.com/slai-labs/get-beam > > This seems to overlap with our branding/messaging on ML. >

Re: DRAFT - Apache Beam Board Report - September 2023

2023-09-12 Thread Kenneth Knowles
As you can probably tell, I copy/pasted. There will be no Beam Summit next week :-) On Tue, Sep 12, 2023 at 11:11 AM Kenneth Knowles wrote: > Hi all, > > The next Beam board report is due tomorrow, Wednesday, September 13. > Please help me to draft it at > https://s.apache.org/be

DRAFT - Apache Beam Board Report - September 2023

2023-09-12 Thread Kenneth Knowles
Hi all, The next Beam board report is due tomorrow, Wednesday, September 13. Please help me to draft it at https://s.apache.org/beam-draft-report-2023-09. The doc is open for anyone to edit. Ideas: - highlights from CHANGES.md - interesting technical discussions - integrations with other

Re: [Proposal] Enable EnricoMi/publish-unit-test-result-action

2023-09-11 Thread Kenneth Knowles
at > setting. Opened [2] for clean up and improvements. > > Best, > Yi > > [1] https://github.com/apache/beam/pull/28212 > [2] https://github.com/apache/beam/issues/28378 > > On Tue, Sep 5, 2023 at 12:26 PM Kenneth Knowles wrote: > >> +1 this seems useful. &

Re: Contribution of Asgarde: Error Handling for Beam?

2023-09-08 Thread Kenneth Knowles
to write per-transform one >>> pagers for isolated things like the most useful pieces (just basically >>> copying the documentation and justifying the API) instead of doing a >>> one-shot import or having it live forever in an external project. >>> >>> -Da

Re: Contribution of Asgarde: Error Handling for Beam?

2023-09-08 Thread Kenneth Knowles
I agree with everyone about "not everything has to be in the Beam repo". I really like the idea of having a clearer "ecosystem" section of the website, which is sort of started at https://beam.apache.org/community/integrations/ but that is not very prominent. Agree with John though. The

Re: [Proposal] Enable EnricoMi/publish-unit-test-result-action

2023-09-05 Thread Kenneth Knowles
+1 this seems useful. Some of the same functionality is also done pretty well or even more in depth via gradle scan. If I recall, some GHA jobs do not upload those. Is that also on the roadmap or is it blocked for some reason? Kenn On Tue, Sep 5, 2023 at 11:54 AM Bruno Volpato via dev wrote:

[PROPOSAL] Design Doc template for PTransforms

2023-08-24 Thread Kenneth Knowles
Hi all, Based on some work I've been doing internally, I put together a public version of a design doc template for PTransforms. https://s.apache.org/ptransform-design-doc A major goal is to be explicit about important questions that make a transform robust: - what are "all" the parameters to

  1   2   3   4   5   6   7   8   9   10   >