Hi, I've been looking lately at a possibility of writing a custom UDAF and I noticed that the function interface supports only sequential aggregation of all results into a single final value. While the COUNT operator is internally planned as a composition of two aggregation stages, other aggregation functions seem to be able to be used only in the context of a central data aggregation in which all aggregate data lands on a single drillbit (I'm looking at total aggregates without filters and grouping).
Are there any plans to introduce a two-stage aggregation function interface similar to, for example, that implemented in Impala? The scenario I'm evaluating involves approximate unique value counting with hyperloglog, which would benefit from the ability to perform the counting locally by each drillbit folowed by a hyperloglog state merge from individual drillbits. Cheers, Marcin
