Hi, On Fri, Nov 08, 2024 at 11:18:09PM +1300, David Rowley wrote: > On Fri, 8 Nov 2024 at 18:34, Michael Paquier <mich...@paquier.xyz> wrote: > > I've done a round of comment and term cleanup for the whole patch, > > while on it. > > I don't think "intrinsics" is the correct word to use here: > > + * - 8 * sizeof(size_t) comparisons using bitwise OR, to encourage compilers > + * to use SIMD intrinsics if available, up to the last aligned location > > and > > + * All comparisons are combined with a single OR operation, making it a > + * good candidate for SIMD intrinsics, if available. > > an intrinsic function is a function built into the compiler that > provides some lower-level functionality. e.g. __builtin_popcount().
Agree, replaced by "instructions" in v10 attached. > I'm slightly worried due to the current rate we're receiving cleanup > suggestions that someone might come along and think they'd be doing us > a favour by submitting a patch to "fixup the inefficient bitwise-ORs > and use boolean-OR". That's a good point, better to be cautious here. > Maybe a comment like the following might prevent > that from happening. > > + * For performance reasons, we manually unroll this loop and purposefully > + * use bitwise-ORs to combine each comparison. This prevents boolean > + * short-circuiting and lets the compiler know that it's safe to access all 8 > + * elements regardless of the result of the other comparisons. This seems > + * to be enough to coax a few compilers into using SIMD instructions. Sounds good to me, used the above in v10. v10 also splits the patch into 2 parts as suggested by Michael up-thread. > > > Btw, gcc seems a bit smarter than clang when it comes to optimizing > > the code depending on the size of the structures. gcc gives up on > > SIMD if it's sure that the structure on which we are going to use the > > allzero check won't need it at all, and clang keeps it even if it does > > not need it. That was interesting to see, while going through the > > review.. > > Can you share your test case for this? +1 Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
>From ff6226aba084830887ea71d6c377b9e94ee79106 Mon Sep 17 00:00:00 2001 From: Michael Paquier <mich...@paquier.xyz> Date: Fri, 8 Nov 2024 14:19:59 +0900 Subject: [PATCH v10 1/2] Optimize pg_memory_is_all_zeros() pg_memory_is_all_zeros() is currently doing byte per byte comparison and so could lead to performance regression or penalties when multi bytes comparison could be done instead. Let's provide an optimized version that divides the checks into four phases for efficiency: - Initial alignment (byte per byte comparison) - Compare 8 size_t chunks at once using bitwise OR (candidate for SIMD optimization) - Compare remaining size_t aligned chunks - Compare remaining bytes (byte per byte comparison) Code mainly suggested by David Rowley. --- src/include/utils/memutils.h | 61 ++++++++++++++++++++++++++++++++++-- 1 file changed, 58 insertions(+), 3 deletions(-) 100.0% src/include/utils/ diff --git a/src/include/utils/memutils.h b/src/include/utils/memutils.h index 3590c8bad9..9637370838 100644 --- a/src/include/utils/memutils.h +++ b/src/include/utils/memutils.h @@ -190,19 +190,74 @@ extern MemoryContext BumpContextCreate(MemoryContext parent, #define SLAB_LARGE_BLOCK_SIZE (8 * 1024 * 1024) /* + * pg_memory_is_all_zeros + * * Test if a memory region starting at "ptr" and of size "len" is full of * zeroes. + * + * The test is divided into multiple phases, to be efficient for various + * length values: + * - Byte by byte comparison, until the pointer is aligned. + * - 8 * sizeof(size_t) comparisons using bitwise OR, to encourage compilers + * to use SIMD instructions if available, up to the last aligned location + * possible. + * - size_t comparisons, with aligned pointers, up to the last location + * possible. + * - Byte by byte comparison, until the end location. + * + * Caller must ensure that "ptr" is not NULL. */ static inline bool pg_memory_is_all_zeros(const void *ptr, size_t len) { - const char *p = (const char *) ptr; + const unsigned char *p = (const unsigned char *) ptr; + const unsigned char *end = &p[len]; + const unsigned char *aligned_end = (const unsigned char *) + ((uintptr_t) end & (~(sizeof(size_t) - 1))); - for (size_t i = 0; i < len; i++) + /* Compare bytes until the pointer "p" is aligned */ + while (((uintptr_t) p & (sizeof(size_t) - 1)) != 0) { - if (p[i] != 0) + if (p == end) + return true; + + if (*p++ != 0) return false; } + + /* + * Compare 8 * sizeof(size_t) chunks at once. + * + * For performance reasons, we manually unroll this loop and purposefully + * use bitwise-ORs to combine each comparison. This prevents boolean + * short-circuiting and lets the compiler know that it's safe to access + * all 8 elements regardless of the result of the other comparisons. This + * seems to be enough to coax a few compilers into using SIMD + * instructions. + */ + for (; p < aligned_end - (sizeof(size_t) * 7); p += sizeof(size_t) * 8) + { + if ((((size_t *) p)[0] != 0) | (((size_t *) p)[1] != 0) | + (((size_t *) p)[2] != 0) | (((size_t *) p)[3] != 0) | + (((size_t *) p)[4] != 0) | (((size_t *) p)[5] != 0) | + (((size_t *) p)[6] != 0) | (((size_t *) p)[7] != 0)) + return false; + } + + /* Compare remaining size_t-aligned chunks */ + for (; p < aligned_end; p += sizeof(size_t)) + { + if (*(size_t *) p != 0) + return false; + } + + /* Compare remaining bytes until the end */ + while (p < end) + { + if (*p++ != 0) + return false; + } + return true; } -- 2.34.1
>From 4b95e9a30efb83f6cb9a0871379cc1db83a9f442 Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot <bertranddrouvot...@gmail.com> Date: Fri, 8 Nov 2024 17:19:43 +0000 Subject: [PATCH v10 2/2] Make use of pg_memory_is_all_zeros() in PageIsVerifiedExtended() Now that pg_memory_is_all_zeros() is optimized to handle multi bytes comparison (instead of byte per byte), let's make use of it in PageIsVerifiedExtended(). --- src/backend/storage/page/bufpage.c | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) 100.0% src/backend/storage/page/ diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c index be6f1f62d2..aa264f61b9 100644 --- a/src/backend/storage/page/bufpage.c +++ b/src/backend/storage/page/bufpage.c @@ -89,10 +89,8 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) { PageHeader p = (PageHeader) page; size_t *pagebytes; - int i; bool checksum_failure = false; bool header_sane = false; - bool all_zeroes = false; uint16 checksum = 0; /* @@ -126,18 +124,9 @@ PageIsVerifiedExtended(Page page, BlockNumber blkno, int flags) } /* Check all-zeroes case */ - all_zeroes = true; pagebytes = (size_t *) page; - for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++) - { - if (pagebytes[i] != 0) - { - all_zeroes = false; - break; - } - } - if (all_zeroes) + if (pg_memory_is_all_zeros(pagebytes, BLCKSZ)) return true; /* -- 2.34.1