Is there a reason that HyperLogLog doesn't use pg_leftmost_one_pos32()? I tried the following patch and some brief performance tests seem to show an improvement.
This came up because my recent commit 9878b643 uses HLL for estimating the cardinality of spill files, which solves a few annoyances with overpartitioning[1]. I think it's overall an improvement, but addHyperLogLog() itself seemed to show up as a cost, so it can hurt spilled-but-still-in-memory cases. I'd also like to backpatch this to 13 (as I already did for 9878b643), if that's acceptable. Regards, Jeff Davis [1] https://www.postgresql.org/message-id/cah2-wznidojad-zbobnfozda5rtcs0jlsqczkdnu+ou1ngy...@mail.gmail.com
diff --git a/src/backend/lib/hyperloglog.c b/src/backend/lib/hyperloglog.c index a5cc1f8b83b..5ef0cc9c449 100644 --- a/src/backend/lib/hyperloglog.c +++ b/src/backend/lib/hyperloglog.c @@ -49,6 +49,7 @@ #include <math.h> #include "lib/hyperloglog.h" +#include "port/pg_bitutils.h" #define POW_2_32 (4294967296.0) #define NEG_POW_2_32 (-4294967296.0) @@ -240,13 +241,15 @@ estimateHyperLogLog(hyperLogLogState *cState) static inline uint8 rho(uint32 x, uint8 b) { - uint8 j = 1; + int p; - while (j <= b && !(x & 0x80000000)) - { - j++; - x <<= 1; - } + if (x == 0) + return b + 1; + + p = 32 - pg_leftmost_one_pos32(x); + + if (p > b) + return b + 1; - return j; + return p; }