Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-29 Thread Jan Lukavský
Going a little further, instead of CombneFn in (b), we might try to solve the problem of incorporating iterations into the model. Iterations (backloops) working without event-timers (i.e. processing time tmers only or no timers at all) should not interfere with watermarks and therefore would

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-29 Thread Jan Lukavský
From my understanding Flink rate limits based on local information only. On the other hand - in case of Flink - this should easily extend to global information, because the parallelism for both batch and streaming is set before job is launched and remains unchanged (until possible manual

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-28 Thread Robert Burke
Sounds like a different variation is either new timer types with those distinctions in mind, or additional configuration for ProcessingTime timers (defaulting to current behavior) to sort out those cases. Could potentially be extended to EventTime timers too for explicitly handling looping

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-28 Thread Jan Lukavský
On 2/27/24 19:49, Robert Bradshaw via dev wrote: On Tue, Feb 27, 2024 at 10:39 AM Jan Lukavský wrote: On 2/27/24 19:22, Robert Bradshaw via dev wrote: On Mon, Feb 26, 2024 at 11:45 AM Kenneth Knowles wrote: Pulling out focus points: On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-27 Thread Reuven Lax via dev
On Tue, Feb 27, 2024 at 10:22 AM Robert Bradshaw via dev < dev@beam.apache.org> wrote: > On Mon, Feb 26, 2024 at 11:45 AM Kenneth Knowles wrote: > > > > Pulling out focus points: > > > > On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev < > dev@beam.apache.org> wrote: > > > I can't act on

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-27 Thread Robert Bradshaw via dev
On Tue, Feb 27, 2024 at 10:39 AM Jan Lukavský wrote: > > On 2/27/24 19:22, Robert Bradshaw via dev wrote: > > On Mon, Feb 26, 2024 at 11:45 AM Kenneth Knowles wrote: > >> Pulling out focus points: > >> > >> On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev > >> wrote: > >>> I can't act

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-27 Thread Jan Lukavský
On 2/27/24 19:30, Robert Bradshaw via dev wrote: On Tue, Feb 27, 2024 at 7:44 AM Robert Burke wrote: An "as fast as it can runner" with dynamic splits, would ultimately split to the systems maximum available parallelism (for stateful DoFns, this is the number of keys; for SplittableDoFns,

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-27 Thread Jan Lukavský
On 2/27/24 19:22, Robert Bradshaw via dev wrote: On Mon, Feb 26, 2024 at 11:45 AM Kenneth Knowles wrote: Pulling out focus points: On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev wrote: I can't act on something yet [...] but I expect to be able to [...] at some time in the

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-27 Thread Robert Bradshaw via dev
On Tue, Feb 27, 2024 at 7:44 AM Robert Burke wrote: > > An "as fast as it can runner" with dynamic splits, would ultimately split to > the systems maximum available parallelism (for stateful DoFns, this is the > number of keys; for SplittableDoFns, this is the maximum sharding of each > input

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-27 Thread Robert Bradshaw via dev
On Mon, Feb 26, 2024 at 11:45 AM Kenneth Knowles wrote: > > Pulling out focus points: > > On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev > wrote: > > I can't act on something yet [...] but I expect to be able to [...] at some > > time in the processing-time future. > > I like this as

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-27 Thread Jan Lukavský
On 2/27/24 16:36, Robert Burke wrote: An "as fast as it can runner" with dynamic splits, would ultimately split to the systems maximum available parallelism (for stateful DoFns, this is the number of keys; for SplittableDoFns, this is the maximum sharding of each input element's restriction.

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-27 Thread Robert Burke
An "as fast as it can runner" with dynamic splits, would ultimately split to the systems maximum available parallelism (for stateful DoFns, this is the number of keys; for SplittableDoFns, this is the maximum sharding of each input element's restriction. That's what would happen with a "normal"

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-27 Thread Jan Lukavský
On 2/27/24 14:51, Kenneth Knowles wrote: I very much like the idea of processing time clock as a parameter to @ProcessElement. That will be obviously useful and remove a source of inconsistency, in addition to letting the runner/SDK harness control it. I also like the idea of passing a Sleeper

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-27 Thread Kenneth Knowles
I very much like the idea of processing time clock as a parameter to @ProcessElement. That will be obviously useful and remove a source of inconsistency, in addition to letting the runner/SDK harness control it. I also like the idea of passing a Sleeper or to @ProcessElement. These are both good

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-26 Thread Jan Lukavský
I think that before we introduce a possibly somewhat duplicate new feature we should be certain that it is really semantically different. I'll rephrase the two cases:  a) need to wait and block data (delay) - the use case is the motivating example of Throttle transform  b) act without data,

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-26 Thread Kenneth Knowles
Yea I like DelayTimer, or SleepTimer, or WaitTimer or some such. OutputTime is always an event time timestamp so it isn't even allowed to be set outside the window (or you'd end up with an element assigned to a window that it isn't within, since OutputTime essentially represents reserving the

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-26 Thread Robert Burke
Agreed that a retroactive behavior change would be bad, even if tied to a beam version change. I agree that it meshes well with the general theme of State & Timers exposing underlying primitives for implementing Windowing and similar. I'd say the distinction between the two might be additional

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-26 Thread Kenneth Knowles
Pulling out focus points: On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev wrote: > I can't act on something yet [...] but I expect to be able to [...] at some time in the processing-time future. I like this as a clear and internally-consistent feature description. It describes

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-23 Thread Robert Bradshaw via dev
On Fri, Feb 23, 2024 at 3:54 PM Robert Burke wrote: > > While I'm currently on the other side of the fence, I would not be against > changing/requiring the semantics of ProcessingTime constructs to be "must > wait and execute" as such a solution, and enables the Proposed "batch" > process

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-23 Thread Robert Burke
While I'm currently on the other side of the fence, I would not be against changing/requiring the semantics of ProcessingTime constructs to be "must wait and execute" as such a solution, and enables the Proposed "batch" process continuation throttling mechanism to work as hypothesized for both

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-23 Thread Robert Bradshaw via dev
Thanks for bringing this up. My position is that both batch and streaming should wait for processing time timers, according to local time (with the exception of tests that can accelerate this via faked clocks). Both ProcessContinuations delays and ProcessingTimeTimers are IMHO isomorphic, and

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-23 Thread Jan Lukavský
For me it always helps to seek analogy in our physical reality. Stream processing actually has quite a good analogy for both event-time and processing-time - the simplest model for this being relativity theory. Event-time is the time at which events occur _at distant locations_. Due to finite

Re: [DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-22 Thread Robert Burke
This is a "timely" discussion because my next step for Prism is to address ProcessingTime. The description of the watermarks matches my understanding and how it's implemented so far in Prism [0], where the "stage" contains one or more transforms to be executed by a worker. My current thinking

[DISCUSS] Processing time timers in "batch" (faster-than-wall-time [re]processing)

2024-02-22 Thread Kenneth Knowles
Forking this thread. The state of processing time timers in this mode of processing is not satisfactory and is discussed a lot but we should make everything explicit. Currently, a state and timer DoFn has a number of logical watermarks: (apologies for fixed width not coming through in email