Is there a reason that HyperLogLog doesn't use pg_leftmost_one_pos32()?

I tried the following patch and some brief performance tests seem to
show an improvement.

This came up because my recent commit 9878b643 uses HLL for estimating
the cardinality of spill files, which solves a few annoyances with
overpartitioning[1]. I think it's overall an improvement, but
addHyperLogLog() itself seemed to show up as a cost, so it can hurt
spilled-but-still-in-memory cases. I'd also like to backpatch this to
13 (as I already did for 9878b643), if that's acceptable.

Regards,
        Jeff Davis

[1] 
https://www.postgresql.org/message-id/cah2-wznidojad-zbobnfozda5rtcs0jlsqczkdnu+ou1ngy...@mail.gmail.com
diff --git a/src/backend/lib/hyperloglog.c b/src/backend/lib/hyperloglog.c
index a5cc1f8b83b..5ef0cc9c449 100644
--- a/src/backend/lib/hyperloglog.c
+++ b/src/backend/lib/hyperloglog.c
@@ -49,6 +49,7 @@
 #include <math.h>
 
 #include "lib/hyperloglog.h"
+#include "port/pg_bitutils.h"
 
 #define POW_2_32			(4294967296.0)
 #define NEG_POW_2_32		(-4294967296.0)
@@ -240,13 +241,15 @@ estimateHyperLogLog(hyperLogLogState *cState)
 static inline uint8
 rho(uint32 x, uint8 b)
 {
-	uint8		j = 1;
+	int p;
 
-	while (j <= b && !(x & 0x80000000))
-	{
-		j++;
-		x <<= 1;
-	}
+	if (x == 0)
+		return b + 1;
+
+	p = 32 - pg_leftmost_one_pos32(x);
+
+	if (p > b)
+		return b + 1;
 
-	return j;
+	return p;
 }

Reply via email to