> > >> Perhaps I'm unclear on what an “Aggregator” is. I assumed that a line >> such as the following: >> >> PCollection<KV<String, Double>> meanByName = >> dataPoints.apply(Mean.<String, Double>perKey()); >> >> …would be considered an Aggregator, since it applies a mean aggregation >> over a window. Is that correct, with respect to the Beam terminology? If >> not, what would an example of an Aggregator be? >> > Ah, we may have some slightly confusing terminology here.
In that code snippet you are using a PTransform (Mean.perKey) to combine a PCollection using the Mean CombineFn <https://github.com/apache/incubator-beam/blob/c199f085473cfcd79014d0a022b5ce3fdd4863ec/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Combine.java#L359>. An Aggregator <https://github.com/apache/incubator-beam/blob/211e76abf9ba34c35ef13cca279cbeefdad7c406/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Aggregator.java#L54> takes a CombineFn and applies it continuously within a DoFn. So it's more analogous to a 'counter'. You can see an example of aggregators in DebuggingWordCount <https://github.com/apache/incubator-beam/blob/master/examples/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java#L129> . We never really used the term *aggregation *to refer to a general set of PTransforms until we started describing things to the community. But it is a useful word, so we've ended up in a bit of confusing state. Maybe we should consider renaming Aggregator? Something like "metric" might be clearer.
