On Wed, Jul 29, 2020 at 10:08 AM Jeff Davis <pg...@j-davis.com> wrote: > Is there a reason that HyperLogLog doesn't use pg_leftmost_one_pos32()?
Yes: HyperLogLog predates pg_leftmost_one_pos32(). > I tried the following patch and some brief performance tests seem to > show an improvement. Makes sense. How did you test this? What kind of difference are we talking about? I ported this code from the upstream C++ as part of the original abbreviated keys commit. Note that the cardinality of abbreviated keys are displayed when you set "trace_sort = on". > This came up because my recent commit 9878b643 uses HLL for estimating > the cardinality of spill files, which solves a few annoyances with > overpartitioning[1]. I think that you should change back the rhs() variable names to match HyperLogLog upstream (as well as the existing rhs() comments). > I think it's overall an improvement, but > addHyperLogLog() itself seemed to show up as a cost, so it can hurt > spilled-but-still-in-memory cases. I'd also like to backpatch this to > 13 (as I already did for 9878b643), if that's acceptable. I still wonder if it was ever necessary to add HLL to abbreviated keys. It only served to avoid some pretty narrow worse cases, at the expense of typical cases. Given that the existing users of HLL are pretty narrow, and given the importance of preserving the favorable performance characteristics of hash aggregate, I'm inclined to agree that it's worth backpatching to 13 now. Assuming it is a really measurable cost in practice. -- Peter Geoghegan