This is something that has been talked about multiple times but no one has started work on it yet (as far as I know).
Do you want to open a JIRA and maybe we can collaborate on getting something put together. There are probably a couple of dependent jiras that will need to be resolved but having a concrete and useful UDAF driving the requirements may be just the motivation to get help on those dependent JIRAs. -- Jacques Nadeau CTO and Co-Founder, Dremio On Sat, Oct 10, 2015 at 9:07 PM, Mike Beddo <[email protected]> wrote: > We are evaluating Drill for making interactive SQL queries against > customer sales transaction data. Many of our queries involve computing > "penetration" numbers: count of unique customers, count of unique baskets, > count of unique stores, etc. So far, using Drill to do aggregations > involving COUNT, SUM, ... give acceptable query execution times. When > including COUNT(DISTINCT <column>) in our queries, the execution times go > from about 1 second to many minutes! > > Has someone written a user-defined aggregate function to do approximate > counting? We think a Bloom filter will serve our needs best. > > > - Mike >
