Okay, here is an updated patch set with first drafts of the commit
messages. I'm reasonably happy with these patches, but I'll admit my
justification for ripping out the 32-bit optimizations feels a bit flimsy.
I don't get the idea that we are all that concerned about things like
micro-regressions for popcount on 32-bit builds, but OTOH it isn't hard to
imagine someone objecting to these changes.
I ran the bms_num_members() benchmark on a couple of machines I had nearby:
apple-m3 (neon) intel-i5-13500T (sse4.2)
words HEAD v8 words HEAD v8
1 40 25 1 26 10
2 57 51 2 37 29
4 75 57 4 55 45
8 105 56 8 88 51
16 154 59 16 158 68
32 265 73 32 296 102
64 545 103 64 577 209
128 1027 178 128 1212 423
I was going to run it on machines with SVE/AVX-512, but John already tested
the AVX-512 case [0], and I have no reason to believe that we'll see
regressions on machines with SVE.
[0]
https://postgr.es/m/CANWCAZbWLX%3DEDd1Bq-8oGK2ZLVNR4m4BkGe%3D288t2V5oLcqeZA%40mail.gmail.com
--
nathan
>From 19404ae038c6fa678c41a2b4db62c9b885896c18 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <[email protected]>
Date: Thu, 22 Jan 2026 11:33:56 -0600
Subject: [PATCH v8 1/3] Remove some unnecessary optimizations in popcount
code.
Over the past few releases, we've added a huge amount of complexity
to our popcount implementations. Commits fbe327e5b4, 79e232ca01,
8c6653516c, and 25dc485074 did some preliminary refactoring, but
many opportunities remain. In particular, if we disclaim interest
in micro-optimizing this code for 32-bit builds and in unproven
alignment checks, we can remove a decent chunk of code.
This commit does the following:
* Removes the code in pg_popcount() and pg_popcount_masked() that
sets the function pointer threshold based on SIDEOF_VOID_P.
Consequently, 32-bit builds should follow the inline path for
inputs between 4-8 bytes instead of calling pg_popcount_optimized()
(which is probably just calling pg_popcount_portable(), anyway).
While it is possible that this results in a small regression for
those inputs on 32-bit builds, it seems unlikely to produce
noticeable performance differences on those machines. Furthermore,
I found no evidence of benchmarks for this area of code for 32-bit
builds.
* Removes the 32-bit optimizations in pg_popcount_portable() and
pg_popcount_masked_portable(). This means that 32-bit builds
instead use a simple while loop. As above, we are not too
concerned about regressions on 32-bit machines.
* Removes 32-bit optimizations in pg_popcount_x86.c. This is dead
code because everything in this file is only compiled when
HAVE_X86_64_POPCNTQ is defined, and that macro is only defined for
x86-64.
* Removes alignment checks in pg_popcount_sse42() and
pg_popcount_masked_sse42(). These are unnecessary for x86, and
it's unclear whether they make any meaningful performance
difference. Since we allow misaligned accesses now, this commit
also adds pg_attribute_no_sanitize_alignment() to these functions.
Suggested-by: John Naylor <[email protected]>
Reviewed-by: John Naylor <[email protected]>
Discussion:
https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
---
src/include/port/pg_bitutils.h | 24 +-----------
src/port/pg_bitutils.c | 30 ---------------
src/port/pg_popcount_x86.c | 67 ++++++----------------------------
3 files changed, 14 insertions(+), 107 deletions(-)
diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
index 35761f509ec..c3049d71894 100644
--- a/src/include/port/pg_bitutils.h
+++ b/src/include/port/pg_bitutils.h
@@ -329,17 +329,7 @@ extern uint64 pg_popcount_masked_optimized(const char
*buf, int bytes, bits8 mas
static inline uint64
pg_popcount(const char *buf, int bytes)
{
- /*
- * We set the threshold to the point at which we'll first use special
- * instructions in the optimized version.
- */
-#if SIZEOF_VOID_P >= 8
- int threshold = 8;
-#else
- int threshold = 4;
-#endif
-
- if (bytes < threshold)
+ if (bytes < 8)
{
uint64 popcnt = 0;
@@ -360,17 +350,7 @@ pg_popcount(const char *buf, int bytes)
static inline uint64
pg_popcount_masked(const char *buf, int bytes, bits8 mask)
{
- /*
- * We set the threshold to the point at which we'll first use special
- * instructions in the optimized version.
- */
-#if SIZEOF_VOID_P >= 8
- int threshold = 8;
-#else
- int threshold = 4;
-#endif
-
- if (bytes < threshold)
+ if (bytes < 8)
{
uint64 popcnt = 0;
diff --git a/src/port/pg_bitutils.c b/src/port/pg_bitutils.c
index ffda75825e5..bec06c06fc3 100644
--- a/src/port/pg_bitutils.c
+++ b/src/port/pg_bitutils.c
@@ -167,20 +167,6 @@ pg_popcount_portable(const char *buf, int bytes)
bytes -= 8;
}
- buf = (const char *) words;
- }
-#else
- /* Process in 32-bit chunks if the buffer is aligned. */
- if (buf == (const char *) TYPEALIGN(4, buf))
- {
- const uint32 *words = (const uint32 *) buf;
-
- while (bytes >= 4)
- {
- popcnt += pg_popcount32_portable(*words++);
- bytes -= 4;
- }
-
buf = (const char *) words;
}
#endif
@@ -215,22 +201,6 @@ pg_popcount_masked_portable(const char *buf, int bytes,
bits8 mask)
bytes -= 8;
}
- buf = (const char *) words;
- }
-#else
- /* Process in 32-bit chunks if the buffer is aligned. */
- uint32 maskv = ~((uint32) 0) / 0xFF * mask;
-
- if (buf == (const char *) TYPEALIGN(4, buf))
- {
- const uint32 *words = (const uint32 *) buf;
-
- while (bytes >= 4)
- {
- popcnt += pg_popcount32_portable(*words++ & maskv);
- bytes -= 4;
- }
-
buf = (const char *) words;
}
#endif
diff --git a/src/port/pg_popcount_x86.c b/src/port/pg_popcount_x86.c
index 245f0167d00..7aebf69898b 100644
--- a/src/port/pg_popcount_x86.c
+++ b/src/port/pg_popcount_x86.c
@@ -376,40 +376,20 @@ __asm__ __volatile__(" popcntq
%1,%0\n":"=q"(res):"rm"(word):"cc");
* pg_popcount_sse42
* Returns the number of 1-bits in buf
*/
+pg_attribute_no_sanitize_alignment()
static uint64
pg_popcount_sse42(const char *buf, int bytes)
{
uint64 popcnt = 0;
+ const uint64 *words = (const uint64 *) buf;
-#if SIZEOF_VOID_P >= 8
- /* Process in 64-bit chunks if the buffer is aligned. */
- if (buf == (const char *) TYPEALIGN(8, buf))
+ while (bytes >= 8)
{
- const uint64 *words = (const uint64 *) buf;
-
- while (bytes >= 8)
- {
- popcnt += pg_popcount64_sse42(*words++);
- bytes -= 8;
- }
-
- buf = (const char *) words;
+ popcnt += pg_popcount64_sse42(*words++);
+ bytes -= 8;
}
-#else
- /* Process in 32-bit chunks if the buffer is aligned. */
- if (buf == (const char *) TYPEALIGN(4, buf))
- {
- const uint32 *words = (const uint32 *) buf;
- while (bytes >= 4)
- {
- popcnt += pg_popcount32_sse42(*words++);
- bytes -= 4;
- }
-
- buf = (const char *) words;
- }
-#endif
+ buf = (const char *) words;
/* Process any remaining bytes */
while (bytes--)
@@ -422,44 +402,21 @@ pg_popcount_sse42(const char *buf, int bytes)
* pg_popcount_masked_sse42
* Returns the number of 1-bits in buf after applying the mask to
each byte
*/
+pg_attribute_no_sanitize_alignment()
static uint64
pg_popcount_masked_sse42(const char *buf, int bytes, bits8 mask)
{
uint64 popcnt = 0;
-
-#if SIZEOF_VOID_P >= 8
- /* Process in 64-bit chunks if the buffer is aligned */
uint64 maskv = ~UINT64CONST(0) / 0xFF * mask;
+ const uint64 *words = (const uint64 *) buf;
- if (buf == (const char *) TYPEALIGN(8, buf))
+ while (bytes >= 8)
{
- const uint64 *words = (const uint64 *) buf;
-
- while (bytes >= 8)
- {
- popcnt += pg_popcount64_sse42(*words++ & maskv);
- bytes -= 8;
- }
-
- buf = (const char *) words;
+ popcnt += pg_popcount64_sse42(*words++ & maskv);
+ bytes -= 8;
}
-#else
- /* Process in 32-bit chunks if the buffer is aligned. */
- uint32 maskv = ~((uint32) 0) / 0xFF * mask;
-
- if (buf == (const char *) TYPEALIGN(4, buf))
- {
- const uint32 *words = (const uint32 *) buf;
-
- while (bytes >= 4)
- {
- popcnt += pg_popcount32_sse42(*words++ & maskv);
- bytes -= 4;
- }
- buf = (const char *) words;
- }
-#endif
+ buf = (const char *) words;
/* Process any remaining bytes */
while (bytes--)
--
2.50.1 (Apple Git-155)
>From 258be25552ecce2a4fca86c071500d9596f861fe Mon Sep 17 00:00:00 2001
From: Nathan Bossart <[email protected]>
Date: Fri, 23 Jan 2026 17:31:20 -0600
Subject: [PATCH v8 2/3] Remove specialized word-length popcount
implementations.
The uses of these functions do not justify the level of
micro-optimization we've done and may even hurt performance in some
cases (e.g., due to using function pointers). This commit removes
all architecture-specific implementations of pg_popcount{32,64}()
and converts the portable ones to inlined functions in
pg_bitutils.h. These inlined versions should produce the same code
as before (but inlined), so in theory this is a net gain for many
machines. As an exception, for x86-64/gcc without sse4.2/popcnt,
we use a plain C version to ensure inlining because
__builtin_popcount() and __builtin_popcountl() generate function
calls for that configuration. Our tests indicate this is still a
net win.
Suggested-by: John Naylor <[email protected]>
Reviewed-by: John Naylor <[email protected]>
Reviewed-by: Greg Burd <[email protected]>
Discussion:
https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
---
src/include/port/pg_bitutils.h | 83 ++++++++++++++++++++++++----------
src/port/pg_bitutils.c | 65 +-------------------------
src/port/pg_popcount_aarch64.c | 25 ----------
src/port/pg_popcount_x86.c | 43 +-----------------
4 files changed, 63 insertions(+), 153 deletions(-)
diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
index c3049d71894..08b9abf5fe7 100644
--- a/src/include/port/pg_bitutils.h
+++ b/src/include/port/pg_bitutils.h
@@ -276,46 +276,83 @@ pg_ceil_log2_64(uint64 num)
return pg_leftmost_one_pos64(num - 1) + 1;
}
-extern int pg_popcount32_portable(uint32 word);
-extern int pg_popcount64_portable(uint64 word);
extern uint64 pg_popcount_portable(const char *buf, int bytes);
extern uint64 pg_popcount_masked_portable(const char *buf, int bytes, bits8
mask);
-#ifdef HAVE_X86_64_POPCNTQ
+#if defined(HAVE_X86_64_POPCNTQ) || defined(USE_SVE_POPCNT_WITH_RUNTIME_CHECK)
/*
- * Attempt to use SSE4.2 or AVX-512 instructions, but perform a runtime check
+ * Attempt to use specialized CPU instructions, but perform a runtime check
* first.
*/
-extern PGDLLIMPORT int (*pg_popcount32) (uint32 word);
-extern PGDLLIMPORT int (*pg_popcount64) (uint64 word);
extern PGDLLIMPORT uint64 (*pg_popcount_optimized) (const char *buf, int
bytes);
extern PGDLLIMPORT uint64 (*pg_popcount_masked_optimized) (const char *buf,
int bytes, bits8 mask);
-#elif defined(USE_NEON)
-/* Use the Neon version of pg_popcount{32,64} without function pointer. */
-extern int pg_popcount32(uint32 word);
-extern int pg_popcount64(uint64 word);
-
-/*
- * We can try to use an SVE-optimized pg_popcount() on some systems For that,
- * we do use a function pointer.
- */
-#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK
-extern PGDLLIMPORT uint64 (*pg_popcount_optimized) (const char *buf, int
bytes);
-extern PGDLLIMPORT uint64 (*pg_popcount_masked_optimized) (const char *buf,
int bytes, bits8 mask);
#else
+/* Use a portable implementation -- no need for a function pointer. */
extern uint64 pg_popcount_optimized(const char *buf, int bytes);
extern uint64 pg_popcount_masked_optimized(const char *buf, int bytes, bits8
mask);
+
#endif
+/*
+ * pg_popcount32
+ * Return the number of 1 bits set in word
+ *
+ * Plain C version adapted from
+ * https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel.
+ */
+static inline int
+pg_popcount32(uint32 word)
+{
+ /*
+ * On x86, gcc generates a function call for this built-in unless the
+ * popcnt instruction is available, so we use the plain C version in
that
+ * case to ensure inlining.
+ */
+#if defined(HAVE__BUILTIN_POPCOUNT) && (defined(__POPCNT__) ||
!defined(__x86_64__))
+ return __builtin_popcount(word);
+#elif defined(_MSC_VER)
+ return __popcnt(word);
#else
-/* Use a portable implementation -- no need for a function pointer. */
-extern int pg_popcount32(uint32 word);
-extern int pg_popcount64(uint64 word);
-extern uint64 pg_popcount_optimized(const char *buf, int bytes);
-extern uint64 pg_popcount_masked_optimized(const char *buf, int bytes, bits8
mask);
+ word -= (word >> 1) & 0x55555555;
+ word = (word & 0x33333333) + ((word >> 2) & 0x33333333);
+ return ((word + (word >> 4) & 0xf0f0f0f) * 0x1010101) >> 24;
+#endif
+}
+/*
+ * pg_popcount64
+ * Return the number of 1 bits set in word
+ *
+ * Plain C version adapted from
+ * https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel.
+ */
+static inline int
+pg_popcount64(uint64 word)
+{
+ /*
+ * On x86, gcc generates a function call for this built-in unless the
+ * popcnt instruction is available, so we use the plain C version in
that
+ * case to ensure inlining.
+ */
+#if defined(HAVE__BUILTIN_POPCOUNT) && (defined(__POPCNT__) ||
!defined(__x86_64__))
+#if SIZEOF_LONG == 8
+ return __builtin_popcountl(word);
+#elif SIZEOF_LONG_LONG == 8
+ return __builtin_popcountll(word);
+#else
+#error "cannot find integer of the same size as uint64_t"
#endif
+#elif defined(_MSC_VER)
+ return __popcnt64(word);
+#else
+ word -= (word >> 1) & UINT64CONST(0x5555555555555555);
+ word = (word & UINT64CONST(0x3333333333333333)) +
+ ((word >> 2) & UINT64CONST(0x3333333333333333));
+ word = (word + (word >> 4)) & UINT64CONST(0xf0f0f0f0f0f0f0f);
+ return (word * UINT64CONST(0x101010101010101)) >> 56;
+#endif
+}
/*
* Returns the number of 1-bits in buf.
diff --git a/src/port/pg_bitutils.c b/src/port/pg_bitutils.c
index bec06c06fc3..49b130f1306 100644
--- a/src/port/pg_bitutils.c
+++ b/src/port/pg_bitutils.c
@@ -96,56 +96,6 @@ const uint8 pg_number_of_ones[256] = {
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8
};
-/*
- * pg_popcount32_portable
- * Return the number of 1 bits set in word
- */
-int
-pg_popcount32_portable(uint32 word)
-{
-#ifdef HAVE__BUILTIN_POPCOUNT
- return __builtin_popcount(word);
-#else /*
!HAVE__BUILTIN_POPCOUNT */
- int result = 0;
-
- while (word != 0)
- {
- result += pg_number_of_ones[word & 255];
- word >>= 8;
- }
-
- return result;
-#endif /*
HAVE__BUILTIN_POPCOUNT */
-}
-
-/*
- * pg_popcount64_portable
- * Return the number of 1 bits set in word
- */
-int
-pg_popcount64_portable(uint64 word)
-{
-#ifdef HAVE__BUILTIN_POPCOUNT
-#if SIZEOF_LONG == 8
- return __builtin_popcountl(word);
-#elif SIZEOF_LONG_LONG == 8
- return __builtin_popcountll(word);
-#else
-#error "cannot find integer of the same size as uint64_t"
-#endif
-#else /*
!HAVE__BUILTIN_POPCOUNT */
- int result = 0;
-
- while (word != 0)
- {
- result += pg_number_of_ones[word & 255];
- word >>= 8;
- }
-
- return result;
-#endif /*
HAVE__BUILTIN_POPCOUNT */
-}
-
/*
* pg_popcount_portable
* Returns the number of 1-bits in buf
@@ -163,7 +113,7 @@ pg_popcount_portable(const char *buf, int bytes)
while (bytes >= 8)
{
- popcnt += pg_popcount64_portable(*words++);
+ popcnt += pg_popcount64(*words++);
bytes -= 8;
}
@@ -197,7 +147,7 @@ pg_popcount_masked_portable(const char *buf, int bytes,
bits8 mask)
while (bytes >= 8)
{
- popcnt += pg_popcount64_portable(*words++ & maskv);
+ popcnt += pg_popcount64(*words++ & maskv);
bytes -= 8;
}
@@ -220,17 +170,6 @@ pg_popcount_masked_portable(const char *buf, int bytes,
bits8 mask)
* actual external functions. The compiler should be able to inline the
* portable versions here.
*/
-int
-pg_popcount32(uint32 word)
-{
- return pg_popcount32_portable(word);
-}
-
-int
-pg_popcount64(uint64 word)
-{
- return pg_popcount64_portable(word);
-}
/*
* pg_popcount_optimized
diff --git a/src/port/pg_popcount_aarch64.c b/src/port/pg_popcount_aarch64.c
index ba57f2cd4bd..74f71593721 100644
--- a/src/port/pg_popcount_aarch64.c
+++ b/src/port/pg_popcount_aarch64.c
@@ -291,31 +291,6 @@ pg_popcount_masked_optimized(const char *buf, int bytes,
bits8 mask)
#endif /* !
USE_SVE_POPCNT_WITH_RUNTIME_CHECK */
-/*
- * pg_popcount32
- * Return number of 1 bits in word
- */
-int
-pg_popcount32(uint32 word)
-{
- return pg_popcount64((uint64) word);
-}
-
-/*
- * pg_popcount64
- * Return number of 1 bits in word
- */
-int
-pg_popcount64(uint64 word)
-{
- /*
- * For some compilers, __builtin_popcountl() already emits Neon
- * instructions. The line below should compile to the same code on
those
- * systems.
- */
- return vaddv_u8(vcnt_u8(vld1_u8((const uint8 *) &word)));
-}
-
/*
* pg_popcount_neon
* Returns number of 1 bits in buf
diff --git a/src/port/pg_popcount_x86.c b/src/port/pg_popcount_x86.c
index 7aebf69898b..6bce089432f 100644
--- a/src/port/pg_popcount_x86.c
+++ b/src/port/pg_popcount_x86.c
@@ -36,8 +36,6 @@
* operation, but in practice this is close enough, and "sse42" seems easier to
* follow than "popcnt" for these names.
*/
-static inline int pg_popcount32_sse42(uint32 word);
-static inline int pg_popcount64_sse42(uint64 word);
static uint64 pg_popcount_sse42(const char *buf, int bytes);
static uint64 pg_popcount_masked_sse42(const char *buf, int bytes, bits8 mask);
@@ -55,12 +53,8 @@ static uint64 pg_popcount_masked_avx512(const char *buf, int
bytes, bits8 mask);
* what the current CPU supports) and then will call the pointer to fulfill the
* caller's request.
*/
-static int pg_popcount32_choose(uint32 word);
-static int pg_popcount64_choose(uint64 word);
static uint64 pg_popcount_choose(const char *buf, int bytes);
static uint64 pg_popcount_masked_choose(const char *buf, int bytes, bits8
mask);
-int (*pg_popcount32) (uint32 word) = pg_popcount32_choose;
-int (*pg_popcount64) (uint64 word) = pg_popcount64_choose;
uint64 (*pg_popcount_optimized) (const char *buf, int bytes) =
pg_popcount_choose;
uint64 (*pg_popcount_masked_optimized) (const char *buf, int bytes,
bits8 mask) = pg_popcount_masked_choose;
@@ -157,7 +151,7 @@ pg_popcount_avx512_available(void)
#endif /*
USE_AVX512_POPCNT_WITH_RUNTIME_CHECK */
/*
- * These functions get called on the first call to pg_popcount32 etc.
+ * These functions get called on the first call to pg_popcount(), etc.
* They detect whether we can use the asm implementations, and replace
* the function pointers so that subsequent calls are routed directly to
* the chosen implementation.
@@ -167,15 +161,11 @@ choose_popcount_functions(void)
{
if (pg_popcount_sse42_available())
{
- pg_popcount32 = pg_popcount32_sse42;
- pg_popcount64 = pg_popcount64_sse42;
pg_popcount_optimized = pg_popcount_sse42;
pg_popcount_masked_optimized = pg_popcount_masked_sse42;
}
else
{
- pg_popcount32 = pg_popcount32_portable;
- pg_popcount64 = pg_popcount64_portable;
pg_popcount_optimized = pg_popcount_portable;
pg_popcount_masked_optimized = pg_popcount_masked_portable;
}
@@ -189,20 +179,6 @@ choose_popcount_functions(void)
#endif
}
-static int
-pg_popcount32_choose(uint32 word)
-{
- choose_popcount_functions();
- return pg_popcount32(word);
-}
-
-static int
-pg_popcount64_choose(uint64 word)
-{
- choose_popcount_functions();
- return pg_popcount64(word);
-}
-
static uint64
pg_popcount_choose(const char *buf, int bytes)
{
@@ -338,23 +314,6 @@ pg_popcount_masked_avx512(const char *buf, int bytes,
bits8 mask)
#endif /*
USE_AVX512_POPCNT_WITH_RUNTIME_CHECK */
-/*
- * pg_popcount32_sse42
- * Return the number of 1 bits set in word
- */
-static inline int
-pg_popcount32_sse42(uint32 word)
-{
-#ifdef _MSC_VER
- return __popcnt(word);
-#else
- uint32 res;
-
-__asm__ __volatile__(" popcntl %1,%0\n":"=q"(res):"rm"(word):"cc");
- return (int) res;
-#endif
-}
-
/*
* pg_popcount64_sse42
* Return the number of 1 bits set in word
--
2.50.1 (Apple Git-155)
>From af10ece1c785d23183d74a5ddd5dab224f469db7 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <[email protected]>
Date: Thu, 22 Jan 2026 11:16:09 -0600
Subject: [PATCH v8 3/3] Make use of pg_popcount() in more places.
This replaces some loops over word-length popcount functions with
calls to our perhaps-over-optimized pg_popcount() function. Since
pg_popcount() uses a function pointer for inputs with sizes >= a
Bitmapset word, this produces a small regression for the common
one-word case in bms_num_members(). To deal with that, this commit
adds an inlined fast-path for that case. This fast-path could
arguably go in pg_popcount() itself (with an appropriate alignment
check), but that is left as a future exercise.
Suggested-by: John Naylor <[email protected]>
Reviewed-by: John Naylor <[email protected]>
Discussion:
https://postgr.es/m/CANWCAZY7R%2Biy%2Br9YM_sySNydHzNqUirx1xk0tB3ej5HO62GdgQ%40mail.gmail.com
---
src/backend/nodes/bitmapset.c | 29 +++++++----------------------
src/include/lib/radixtree.h | 4 ++--
2 files changed, 9 insertions(+), 24 deletions(-)
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index a4765876c31..786f343b3c9 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -553,14 +553,8 @@ bms_member_index(Bitmapset *a, int x)
bitnum = BITNUM(x);
/* count bits in preceding words */
- for (int i = 0; i < wordnum; i++)
- {
- bitmapword w = a->words[i];
-
- /* No need to count the bits in a zero word */
- if (w != 0)
- result += bmw_popcount(w);
- }
+ result += pg_popcount((const char *) a->words,
+ wordnum * sizeof(bitmapword));
/*
* Now add bits of the last word, but only those before the item. We can
@@ -749,26 +743,17 @@ bms_get_singleton_member(const Bitmapset *a, int *member)
int
bms_num_members(const Bitmapset *a)
{
- int result = 0;
- int nwords;
- int wordnum;
-
Assert(bms_is_valid_set(a));
if (a == NULL)
return 0;
- nwords = a->nwords;
- wordnum = 0;
- do
- {
- bitmapword w = a->words[wordnum];
+ /* fast-path for common case */
+ if (a->nwords == 1)
+ return bmw_popcount(a->words[0]);
- /* No need to count the bits in a zero word */
- if (w != 0)
- result += bmw_popcount(w);
- } while (++wordnum < nwords);
- return result;
+ return pg_popcount((const char *) a->words,
+ a->nwords * sizeof(bitmapword));
}
/*
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index b223ce10a2d..1425654a67c 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -2725,8 +2725,8 @@ RT_VERIFY_NODE(RT_NODE * node)
/* RT_DUMP_NODE(node); */
- for (int i = 0; i <
RT_BM_IDX(RT_NODE_MAX_SLOTS); i++)
- cnt += bmw_popcount(n256->isset[i]);
+ cnt += pg_popcount((const char *) n256->isset,
+
RT_NODE_MAX_SLOTS / BITS_PER_BYTE);
/*
* Check if the number of used chunk matches,
accounting for
--
2.50.1 (Apple Git-155)