Going a little further, instead of CombneFn in (b), we might try to
solve the problem of incorporating iterations into the model. Iterations
(backloops) working without event-timers (i.e. processing time tmers
only or no timers at all) should not interfere with watermarks and
therefore would
From my understanding Flink rate limits based on local information
only. On the other hand - in case of Flink - this should easily extend
to global information, because the parallelism for both batch and
streaming is set before job is launched and remains unchanged (until
possible manual
Sounds like a different variation is either new timer types with those
distinctions in mind, or additional configuration for ProcessingTime timers
(defaulting to current behavior) to sort out those cases. Could potentially be
extended to EventTime timers too for explicitly handling looping
On 2/27/24 19:49, Robert Bradshaw via dev wrote:
On Tue, Feb 27, 2024 at 10:39 AM Jan Lukavský wrote:
On 2/27/24 19:22, Robert Bradshaw via dev wrote:
On Mon, Feb 26, 2024 at 11:45 AM Kenneth Knowles wrote:
Pulling out focus points:
On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev
On Tue, Feb 27, 2024 at 10:22 AM Robert Bradshaw via dev <
dev@beam.apache.org> wrote:
> On Mon, Feb 26, 2024 at 11:45 AM Kenneth Knowles wrote:
> >
> > Pulling out focus points:
> >
> > On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev <
> dev@beam.apache.org> wrote:
> > > I can't act on
On Tue, Feb 27, 2024 at 10:39 AM Jan Lukavský wrote:
>
> On 2/27/24 19:22, Robert Bradshaw via dev wrote:
> > On Mon, Feb 26, 2024 at 11:45 AM Kenneth Knowles wrote:
> >> Pulling out focus points:
> >>
> >> On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev
> >> wrote:
> >>> I can't act
On 2/27/24 19:30, Robert Bradshaw via dev wrote:
On Tue, Feb 27, 2024 at 7:44 AM Robert Burke wrote:
An "as fast as it can runner" with dynamic splits, would ultimately split to the systems
maximum available parallelism (for stateful DoFns, this is the number of keys; for SplittableDoFns,
On 2/27/24 19:22, Robert Bradshaw via dev wrote:
On Mon, Feb 26, 2024 at 11:45 AM Kenneth Knowles wrote:
Pulling out focus points:
On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev
wrote:
I can't act on something yet [...] but I expect to be able to [...] at some
time in the
On Tue, Feb 27, 2024 at 7:44 AM Robert Burke wrote:
>
> An "as fast as it can runner" with dynamic splits, would ultimately split to
> the systems maximum available parallelism (for stateful DoFns, this is the
> number of keys; for SplittableDoFns, this is the maximum sharding of each
> input
On Mon, Feb 26, 2024 at 11:45 AM Kenneth Knowles wrote:
>
> Pulling out focus points:
>
> On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev
> wrote:
> > I can't act on something yet [...] but I expect to be able to [...] at some
> > time in the processing-time future.
>
> I like this as
On 2/27/24 16:36, Robert Burke wrote:
An "as fast as it can runner" with dynamic splits, would ultimately
split to the systems maximum available parallelism (for stateful
DoFns, this is the number of keys; for SplittableDoFns, this is the
maximum sharding of each input element's restriction.
An "as fast as it can runner" with dynamic splits, would ultimately split
to the systems maximum available parallelism (for stateful DoFns, this is
the number of keys; for SplittableDoFns, this is the maximum sharding of
each input element's restriction. That's what would happen with a "normal"
On 2/27/24 14:51, Kenneth Knowles wrote:
I very much like the idea of processing time clock as a parameter
to @ProcessElement. That will be obviously useful and remove a source
of inconsistency, in addition to letting the runner/SDK harness
control it. I also like the idea of passing a Sleeper
I very much like the idea of processing time clock as a parameter
to @ProcessElement. That will be obviously useful and remove a source of
inconsistency, in addition to letting the runner/SDK harness control it. I
also like the idea of passing a Sleeper or to @ProcessElement. These are
both good
I think that before we introduce a possibly somewhat duplicate new
feature we should be certain that it is really semantically different.
I'll rephrase the two cases:
a) need to wait and block data (delay) - the use case is the
motivating example of Throttle transform
b) act without data,
Yea I like DelayTimer, or SleepTimer, or WaitTimer or some such.
OutputTime is always an event time timestamp so it isn't even allowed to be
set outside the window (or you'd end up with an element assigned to a
window that it isn't within, since OutputTime essentially represents
reserving the
Agreed that a retroactive behavior change would be bad, even if tied to a
beam version change. I agree that it meshes well with the general theme of
State & Timers exposing underlying primitives for implementing Windowing
and similar. I'd say the distinction between the two might be additional
Pulling out focus points:
On Fri, Feb 23, 2024 at 7:21 PM Robert Bradshaw via dev
wrote:
> I can't act on something yet [...] but I expect to be able to [...] at
some time in the processing-time future.
I like this as a clear and internally-consistent feature description. It
describes
On Fri, Feb 23, 2024 at 3:54 PM Robert Burke wrote:
>
> While I'm currently on the other side of the fence, I would not be against
> changing/requiring the semantics of ProcessingTime constructs to be "must
> wait and execute" as such a solution, and enables the Proposed "batch"
> process
While I'm currently on the other side of the fence, I would not be against
changing/requiring the semantics of ProcessingTime constructs to be "must wait
and execute" as such a solution, and enables the Proposed "batch" process
continuation throttling mechanism to work as hypothesized for both
Thanks for bringing this up.
My position is that both batch and streaming should wait for
processing time timers, according to local time (with the exception of
tests that can accelerate this via faked clocks).
Both ProcessContinuations delays and ProcessingTimeTimers are IMHO
isomorphic, and
For me it always helps to seek analogy in our physical reality. Stream
processing actually has quite a good analogy for both event-time and
processing-time - the simplest model for this being relativity theory.
Event-time is the time at which events occur _at distant locations_. Due
to finite
This is a "timely" discussion because my next step for Prism is to address
ProcessingTime.
The description of the watermarks matches my understanding and how it's
implemented so far in Prism [0], where the "stage" contains one or more
transforms to be executed by a worker.
My current thinking
Forking this thread.
The state of processing time timers in this mode of processing is not
satisfactory and is discussed a lot but we should make everything explicit.
Currently, a state and timer DoFn has a number of logical watermarks:
(apologies for fixed width not coming through in email
24 matches
Mail list logo