Kenn, Gotcha. When do you think that revision will be done, out of curiosity? Also, the EWMA example is a bit simplified; there are other forecasting algorithms we plan to implement which require more complex state management.
As of now, our plan is to create a sort of "scan" function which can be applied to a pipeline, with semantics as described in my previous email, and implement it for the Flink pipeline translator/runner. Any thoughts on this/is a similar construct at all part of the Beam roadmap? Trying to create something that would be useful for the community at-large, not just us. David On Fri, Jul 15, 2016 at 3:51 PM, Kenneth Knowles <[email protected]> wrote: > Hi David, > > I'm responsible for that feature; it was under design review at the advent > of Beam (hence the low issue number). I'm working on prepping a revision > that adds context and generalizes to Beam, rather than just Dataflow. > > For your use case, it isn't as simple as stateful decay if you care about > event time, since windowing does not actually reorder your inputs. With a > watermark-based trigger you are most likely close enough, though even then > if the watermark passes the end of multiple windows in one update they are > all permitted to be output, in any order. And transport between transforms > is not required to be order preserving, though obviously in many backends > ends it is, especially to support stateful pipelines. We want to talk about > ordering explicitly in Beam, or else presume a lack of order. > > The good news is you can calculate the EWMA directly using sliding windows > that are large enough so their tail has negligible contribution. This > should work well and as a bonus you'll get the full time series. > > Apologies if I've misunderstood what you are hoping to accomplish. > > Kenn > > On Jul 15, 2016 3:06 PM, "David Desberg" <[email protected]> wrote: > > > > Hi all, > > > > I'm working on a simple Beam app which should compute an exponential > weighted moving average on a stream of data, by key. The data is windowed > at a fixed interval, the count of the elements is taken per-key, and then > this count should be utilized to update the moving average. This requires > maintaining state (local to each JVM instance/per-key) in the form of the > result of the previous computation. Either a key-value based state store > (as is available in Flink) or an implementation of scan semantics (where > the result of the previous computation is passed as the initial value to > the next invocation) would work; however, there does not seem to be a way > to achieve either of these with the Beam API as it currently stands. > > > > I noticed a related JIRA issue ( > https://issues.apache.org/jira/browse/BEAM-25) but it seems no progress > has been made. Is there vision/roadmap for this API? I would be happy to > contribute to the project by beginning an implementation and would love to > collaborate with anyone already working toward this goal. > > > > Thanks! > > David Desberg >
