This is something that has been bugging me for a while. I definitely want to work on this, but I don't know when I will get the chance. But I would definitely consider it a planned feature.
On Thu, Apr 16, 2015 at 3:15 AM, Marcin Karpinski <[email protected]> wrote: > Hi, > > I've been looking lately at a possibility of writing a custom UDAF and I > noticed that the function interface supports only sequential aggregation of > all results into a single final value. While the COUNT operator is > internally planned as a composition of two aggregation stages, other > aggregation functions seem to be able to be used only in the context of a > central data aggregation in which all aggregate data lands on a single > drillbit (I'm looking at total aggregates without filters and grouping). > > Are there any plans to introduce a two-stage aggregation function interface > similar to, for example, that implemented in Impala? The scenario I'm > evaluating involves approximate unique value counting with hyperloglog, > which would benefit from the ability to perform the counting locally by > each drillbit folowed by a hyperloglog state merge from individual > drillbits. > > Cheers, > Marcin > -- Steven Phillips Software Engineer mapr.com
