On Tue, Jan 19, 2016 at 2:03 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
> FWIW I've been considering adding APPROX_COUNT_DISTINCT() aggregate,
> similarly to what other databases (e.g. Vertica) have built-in. Now, that
> would not require the merge too, but we're currently baking support for
> 'combine' functions, and that's exactly what merge does.
> So why not just fix the bug?

You can go from the sparse representation to the dense representation,
so in principle you can merge two of our HLL states, if you are then
satisfied with having a new representation for the merged state. I
don't have time right now to do a full analysis of whether or not it's
possible to just fix the bug without doing all that, but I think it
might not be.

I think we benefit from the simplicity of the sparse representation.
So, in the absence of a clear justification for retaining
mergeHyperLogLog(), ripping it out seems best.

I also think that an expanded set of functionality would be required
for your APPROX_COUNT_DISTINCT() patch anyway, including support for
multiple representations (perhaps this isn't documented in your
APPROX_COUNT_DISTINCT(), but complete implementations seem to switch
from sparse to full at a certain point). So, ripping out
mergeHyperLogLog() doesn't really make that effort any more difficult.

Peter Geoghegan

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to