Thanks, Billie, that clears things up. -Russ
On Tue, Jul 15, 2014 at 11:44 AM, Billie Rinaldi <[email protected]> wrote: > Yes, any individual scan should be able to calculate an accurate average > based on the entries present at the time of the scan. You just can't > pre-compute an average, but you can pre-compute the sum and count and do > the division on the fly. For averaging, finishing up the calculation is > trivial, but it is a simple example of a reducer that loses information > when calculating its result: there is no function f(avg(v_0, ... ,v_N), > v_new) that equals avg(v_0, ... ,v_N, v_new) when you don't know N. You > would not want a combiner that loses information to run during major or > minor compaction scopes. > > > On Fri, Jul 11, 2014 at 12:38 AM, Russ Weeks <[email protected]> > wrote: > >> Hi, >> >> I'd like to understand this paragraph in the Accumulo manual a little >> better: >> >> "The only restriction on an combining iterator is that the combiner >> developer should not assume that all values for a given key have been seen, >> since new mutations can be inserted at anytime. This precludes using the >> total number of values in the aggregation such as when calculating an >> average, for example." >> >> By "using the total number of values in the aggregation", I presume that >> it means inside the combiner's reduce method? Because it seems like if I'm >> using the example StatsCombiner registered on all 3 scopes, after the scan >> completes the count and the sum fields should be consistent (w.r.t each >> other, of course new mutations could have been added since the scan >> started) and if I divide the two I'll get an accurate average, right? >> >> Thanks, >> -Russ >> > >
