Re: Stateful Stream Processing

David Desberg Fri, 15 Jul 2016 19:00:57 -0700

Kenn,

Gotcha. When do you think that revision will be done, out of curiosity?
Also, the EWMA example is a bit simplified; there are other forecasting
algorithms we plan to implement which require more complex state
management.


As of now, our plan is to create a sort of "scan" function which can be
applied to a pipeline, with semantics as described in my previous email,
and implement it for the Flink pipeline translator/runner. Any thoughts on
this/is a similar construct at all part of the Beam roadmap? Trying to
create something that would be useful for the community at-large, not just
us.

David

On Fri, Jul 15, 2016 at 3:51 PM, Kenneth Knowles <[email protected]> wrote:

> Hi David,
>
> I'm responsible for that feature; it was under design review at the advent
> of Beam (hence the low issue number). I'm working on prepping a revision
> that adds context and generalizes to Beam, rather than just Dataflow.
>
> For your use case, it isn't as simple as stateful decay if you care about
> event time, since windowing does not actually reorder your inputs. With a
> watermark-based trigger you are most likely close enough, though even then
> if the watermark passes the end of multiple windows in one update they are
> all permitted to be output, in any order. And transport between transforms
> is not required to be order preserving, though obviously in many backends
> ends it is, especially to support stateful pipelines. We want to talk about
> ordering explicitly in Beam, or else presume a lack of order.
>
> The good news is you can calculate the EWMA directly using sliding windows
> that are large enough so their tail has negligible contribution. This
> should work well and as a bonus you'll get the full time series.
>
> Apologies if I've misunderstood what you are hoping to accomplish.
>
> Kenn
>
> On Jul 15, 2016 3:06 PM, "David Desberg" <[email protected]> wrote:
> >
> > Hi all,
> >
> > I'm working on a simple Beam app which should compute an exponential
> weighted moving average on a stream of data, by key. The data is windowed
> at a fixed interval, the count of the elements is taken per-key, and then
> this count should be utilized to update the moving average. This requires
> maintaining state (local to each JVM instance/per-key) in the form of the
> result of the previous computation. Either a key-value based state store
> (as is available in Flink) or an implementation of scan semantics (where
> the result of the previous computation is passed as the initial value to
> the next invocation) would work; however, there does not seem to be a way
> to achieve either of these with the Beam API as it currently stands.
> >
> > I noticed a related JIRA issue (
> https://issues.apache.org/jira/browse/BEAM-25) but it seems no progress
> has been made. Is there vision/roadmap for this API? I would be happy to
> contribute to the project by beginning an implementation and would love to
> collaborate with anyone already working toward this goal.
> >
> > Thanks!
> > David Desberg
>

Re: Stateful Stream Processing

Reply via email to