On Tue, Jan 19, 2016 at 2:03 PM, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > > FWIW I've been considering adding APPROX_COUNT_DISTINCT() aggregate, > similarly to what other databases (e.g. Vertica) have built-in. Now, that > would not require the merge too, but we're currently baking support for > 'combine' functions, and that's exactly what merge does. > > So why not just fix the bug?
You can go from the sparse representation to the dense representation, so in principle you can merge two of our HLL states, if you are then satisfied with having a new representation for the merged state. I don't have time right now to do a full analysis of whether or not it's possible to just fix the bug without doing all that, but I think it might not be. I think we benefit from the simplicity of the sparse representation. So, in the absence of a clear justification for retaining mergeHyperLogLog(), ripping it out seems best. I also think that an expanded set of functionality would be required for your APPROX_COUNT_DISTINCT() patch anyway, including support for multiple representations (perhaps this isn't documented in your APPROX_COUNT_DISTINCT(), but complete implementations seem to switch from sparse to full at a certain point). So, ripping out mergeHyperLogLog() doesn't really make that effort any more difficult. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers