Hi,

I've been looking lately at a possibility of writing a custom UDAF and I
noticed that the function interface supports only sequential aggregation of
all results into a single final value. While the COUNT operator is
internally planned as a composition of two aggregation stages, other
aggregation functions seem to be able to be used only in the context of a
central data aggregation in which all aggregate data lands on a single
drillbit (I'm looking at total aggregates without filters and grouping).

Are there any plans to introduce a two-stage aggregation function interface
similar to, for example, that implemented in Impala? The scenario I'm
evaluating involves approximate unique value counting with hyperloglog,
which would benefit from the ability to perform the counting locally by
each drillbit folowed by a hyperloglog state merge from individual
drillbits.

Cheers,
Marcin

Reply via email to