Google's streaming implementation has the same property: counters are not committed with work and so updates may sometimes be lost (ie undercounted), or may be replayed (ie overcounted). It's a tradeoff between having low-latency and cheep monitoring against coherence with the underlying processing.
On Tue, Mar 22, 2016 at 1:57 AM, Aljoscha Krettek <[email protected]> wrote: > Hi, > in Flink the accumulators/aggregators are not faul-tolerant. In case of a > failure the job will be restarted but the accumulators will start from > scratch. Initially they were only meant as a rough way to gauge the > progress that a job is making. People should not rely on them for accurate > numbers right now. > > Cheers, > Aljoscha > > On 21 Mar 2016, at 20:37, William McCarthy <[email protected]> > wrote: > > > > Hi, > > > > I just had a look at the capability matrix here: > http://beam.incubator.apache.org/capability-matrix/ . I really like it, > as it gives a nice summary of the current state of implementation > completeness for the different runners. > > > > I had one follow-up question, regarding the cell at the intersection of > the Aggregators row and the Apache Flink column, with this content: "In > streaming mode, Aggregators may undercount”. Can you give me some ideas > about what this means? In what circumstances might this happen? Are there > some mitigation strategies that are appropriate? > > > > Thanks, > > > > Bill > >
