Re: approximate count distinct?

Jacques Nadeau Mon, 12 Oct 2015 15:00:07 -0700

This is something that has been talked about multiple times but no one has
started work on it yet (as far as I know).

Do you want to open a JIRA and maybe we can collaborate on getting
something put together. There are probably a couple of dependent jiras that
will need to be resolved but having a concrete and useful UDAF driving the
requirements may be just the motivation to get help on those dependent
JIRAs.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Sat, Oct 10, 2015 at 9:07 PM, Mike Beddo <[email protected]>
wrote:

> We are evaluating Drill for making interactive SQL queries against
> customer sales transaction data. Many of our queries involve computing
> "penetration" numbers: count of unique customers, count of unique baskets,
> count of unique stores, etc. So far, using Drill to do aggregations
> involving COUNT, SUM, ... give acceptable query execution times. When
> including COUNT(DISTINCT <column>) in our queries, the execution times go
> from about 1 second to many minutes!
>
> Has someone written a user-defined aggregate function to do approximate
> counting? We think a Bloom filter will serve our needs best.
>
>
> -          Mike
>

Re: approximate count distinct?

Reply via email to