I started work on this an ran straight into the brick wall that UDAF's can't have arbitrary structures as workspaces.
A secondary road-block was that UDAF's can't be made into multi-level aggregators. I can't fix these problems because, there isn't enough documentation. I moved on to the rest of the gazillion things I need to do but I would love to come back to this. On Mon, Oct 12, 2015 at 2:59 PM, Jacques Nadeau <[email protected]> wrote: > This is something that has been talked about multiple times but no one has > started work on it yet (as far as I know). > > Do you want to open a JIRA and maybe we can collaborate on getting > something put together. There are probably a couple of dependent jiras that > will need to be resolved but having a concrete and useful UDAF driving the > requirements may be just the motivation to get help on those dependent > JIRAs. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Sat, Oct 10, 2015 at 9:07 PM, Mike Beddo <[email protected]> > wrote: > > > We are evaluating Drill for making interactive SQL queries against > > customer sales transaction data. Many of our queries involve computing > > "penetration" numbers: count of unique customers, count of unique > baskets, > > count of unique stores, etc. So far, using Drill to do aggregations > > involving COUNT, SUM, ... give acceptable query execution times. When > > including COUNT(DISTINCT <column>) in our queries, the execution times go > > from about 1 second to many minutes! > > > > Has someone written a user-defined aggregate function to do approximate > > counting? We think a Bloom filter will serve our needs best. > > > > > > - Mike > > >
