Re: approximate count distinct?

Ted Dunning Mon, 12 Oct 2015 21:17:52 -0700

I started work on this an ran straight into the brick wall that UDAF's
can't have arbitrary structures as workspaces.


A secondary road-block was that UDAF's can't be made into multi-level
aggregators.

I can't fix these problems because, there isn't enough documentation.

I moved on to the rest of the gazillion things I need to do but I would
love to come back to this.



On Mon, Oct 12, 2015 at 2:59 PM, Jacques Nadeau <[email protected]> wrote:

> This is something that has been talked about multiple times but no one has
> started work on it yet (as far as I know).
>
> Do you want to open a JIRA and maybe we can collaborate on getting
> something put together. There are probably a couple of dependent jiras that
> will need to be resolved but having a concrete and useful UDAF driving the
> requirements may be just the motivation to get help on those dependent
> JIRAs.
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Sat, Oct 10, 2015 at 9:07 PM, Mike Beddo <[email protected]>
> wrote:
>
> > We are evaluating Drill for making interactive SQL queries against
> > customer sales transaction data. Many of our queries involve computing
> > "penetration" numbers: count of unique customers, count of unique
> baskets,
> > count of unique stores, etc. So far, using Drill to do aggregations
> > involving COUNT, SUM, ... give acceptable query execution times. When
> > including COUNT(DISTINCT <column>) in our queries, the execution times go
> > from about 1 second to many minutes!
> >
> > Has someone written a user-defined aggregate function to do approximate
> > counting? We think a Bloom filter will serve our needs best.
> >
> >
> > -          Mike
> >
>

Re: approximate count distinct?

Reply via email to