>
>
>> Perhaps I'm unclear on what an “Aggregator” is. I assumed that a line
>> such as the following:
>>
>> PCollection<KV<String, Double>> meanByName =
>> dataPoints.apply(Mean.<String, Double>perKey());
>>
>> …would be considered an Aggregator, since it applies a mean aggregation
>> over a window. Is that correct, with respect to the Beam terminology? If
>> not, what would an example of an Aggregator be?
>>
>
Ah, we may have some slightly confusing terminology here.

In that code snippet you are using a PTransform (Mean.perKey) to combine a
PCollection using the Mean CombineFn
<https://github.com/apache/incubator-beam/blob/c199f085473cfcd79014d0a022b5ce3fdd4863ec/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Combine.java#L359>.
An Aggregator
<https://github.com/apache/incubator-beam/blob/211e76abf9ba34c35ef13cca279cbeefdad7c406/sdk/src/main/java/com/google/cloud/dataflow/sdk/transforms/Aggregator.java#L54>
takes a CombineFn and applies it continuously within a DoFn. So it's more
analogous to a 'counter'. You can see an example of aggregators in
DebuggingWordCount
<https://github.com/apache/incubator-beam/blob/master/examples/src/main/java/com/google/cloud/dataflow/examples/DebuggingWordCount.java#L129>
.

We never really used the term *aggregation *to refer to a general set of
PTransforms until we started describing things to the community. But it is
a useful word, so we've ended up in a bit of confusing state. Maybe we
should consider renaming Aggregator? Something like "metric" might be
clearer.

Reply via email to