I didn't get any closer to radix-tree regression, but I did find some
inefficiencies in tidstore_add_tids() that are worth talking about first,
addressed in a rough fashion in the attached .txt addendums that I can
clean up and incorporate later.

To start, I can reproduce the regression with this test as well:

select * from bench_tidstore_load(0, 10 * 1000 * 1000);

v15 + v26 store + adjustments:
 mem_allocated | load_ms
---------------+---------
      98202152 |    1676

v26 0001-0008
 mem_allocated | load_ms
---------------+---------
      98202032 |    1826

...and reverting to the alternate way to update the parent didn't help:

v26 0001-6, 0008, insert_inner w/ null parent

 mem_allocated | load_ms
---------------+---------
      98202032 |    1825

...and I'm kind of glad that wasn't the problem, because going back to that
would be a pain for the shmem case.

Running perf doesn't show anything much different in the proportions (note
that rt_set must have been inlined when declared locally in v26):

v15 + v26 store + adjustments:
  65.88%  postgres  postgres             [.] tidstore_add_tids
  10.74%  postgres  postgres             [.] rt_set
   9.20%  postgres  postgres             [.] palloc0
   6.49%  postgres  postgres             [.] rt_node_insert_leaf

v26 0001-0008
  78.50%  postgres  postgres             [.] tidstore_add_tids
   8.88%  postgres  postgres             [.] palloc0
   6.24%  postgres  postgres             [.] local_rt_node_insert_leaf

v2699-0001: The first thing I noticed is that palloc0 is taking way more
time than it should, and it's because the compiler doesn't know the
values[] array is small. One reason we need to zero the array is to make
the algorithm agnostic about what order the offsets come in, as I requested
in a previous review. Thinking some more, I was way too paranoid about
that. As long as access methods scan the line pointer array in the usual
way, maybe we can just assert that the keys we create are in order, and
zero any unused array entries as we find them. (I admit I can't actually
think of a reason we would ever encounter offsets out of order.) Also, we
can keep track of the last key we need to consider for insertion into the
radix tree, and ignore the rest. That might shave a few cycles during the
exclusive lock when the max offset of an LP_DEAD item < 64 on a given page,
which I think would be common in the wild. I also got rid of the special
case for non-encoding, since shifting by zero should work the same way.
These together led to a nice speedup on the v26 branch:

 mem_allocated | load_ms
---------------+---------
      98202032 |    1386

v2699-0002: The next thing I noticed is forming a full ItemIdPointer to
pass to tid_to_key_off(). That's bad for tidstore_add_tids() because
ItemPointerSetBlockNumber() must do this in order to allow the struct to be
SHORTALIGN'd:

static inline void
BlockIdSet(BlockIdData *blockId, BlockNumber blockNumber)
{
blockId->bi_hi = blockNumber >> 16;
blockId->bi_lo = blockNumber & 0xffff;
}

Then, tid_to_key_off() calls ItemPointerGetBlockNumber(), which must
reverse the above process:

static inline BlockNumber
BlockIdGetBlockNumber(const BlockIdData *blockId)
{
return (((BlockNumber) blockId->bi_hi) << 16) | ((BlockNumber)
blockId->bi_lo);
}

There is no reason to do any of this if we're not reading/writing directly
to/from an on-disk tid etc. To avoid this, I created a new function
encode_key_off() [name could be better], which deals with the raw block
number that we already have. Then turn tid_to_key_off() into a wrapper
around that, since we still need the full conversion for
tidstore_lookup_tid().

v2699-0003: Get rid of all the remaining special cases for encoding/or not.
I am unaware of the need to optimize that case or treat it in any way
differently. I haven't tested this on an installation with non-default
blocksize and didn't measure this separately, but 0002+0003 gives:

 mem_allocated | load_ms
---------------+---------
      98202032 |    1259

If these are acceptable, I can incorporate them into a later patchset. In
any case, speeding up tidstore_add_tids() will make any regressions in the
backing radix tree more obvious. I will take a look at that next week.

--
John Naylor
EDB: http://www.enterprisedb.com
From 6bdd33fa4f55757b54d16ce00dc60a21b929606e Mon Sep 17 00:00:00 2001
From: John Naylor <john.nay...@postgresql.org>
Date: Sat, 11 Feb 2023 10:45:21 +0700
Subject: [PATCH v2699 2/3] Do less work when encoding key/value

---
 src/backend/access/common/tidstore.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/common/tidstore.c 
b/src/backend/access/common/tidstore.c
index 5d24680737..3d384cf645 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -159,6 +159,7 @@ typedef struct TidStoreIter
 
 static void tidstore_iter_extract_tids(TidStoreIter *iter, uint64 key, uint64 
val);
 static inline BlockNumber key_get_blkno(TidStore *ts, uint64 key);
+static inline uint64 encode_key_off(TidStore *ts, BlockNumber block, uint32 
offset, uint32 *off);
 static inline uint64 tid_to_key_off(TidStore *ts, ItemPointer tid, uint32 
*off);
 
 /*
@@ -367,7 +368,6 @@ void
 tidstore_add_tids(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,
                                  int num_offsets)
 {
-       ItemPointerData tid;
        uint64  *values;
        uint64  key;
        uint64  prev_key;
@@ -381,16 +381,12 @@ tidstore_add_tids(TidStore *ts, BlockNumber blkno, 
OffsetNumber *offsets,
        values = palloc(sizeof(uint64) * nkeys);
        key = prev_key = key_base;
 
-       ItemPointerSetBlockNumber(&tid, blkno);
-
        for (int i = 0; i < num_offsets; i++)
        {
                uint32  off;
 
-               ItemPointerSetOffsetNumber(&tid, offsets[i]);
-
                /* encode the tid to key and val */
-               key = tid_to_key_off(ts, &tid, &off);
+               key = encode_key_off(ts, blkno, offsets[i], &off);
 
                /* make sure we scanned the line pointer array in order */
                Assert(key >= prev_key);
@@ -681,20 +677,29 @@ key_get_blkno(TidStore *ts, uint64 key)
 /* Encode a tid to key and offset */
 static inline uint64
 tid_to_key_off(TidStore *ts, ItemPointer tid, uint32 *off)
+{
+       uint32 offset = ItemPointerGetOffsetNumber(tid);
+       BlockNumber block = ItemPointerGetBlockNumber(tid);
+
+       return encode_key_off(ts, block, offset, off);
+}
+
+/* encode a block and offset to a key and partial offset */
+static inline uint64
+encode_key_off(TidStore *ts, BlockNumber block, uint32 offset, uint32 *off)
 {
        uint64 key;
        uint64 tid_i;
 
        if (!ts->control->encode_tids)
        {
-               *off = ItemPointerGetOffsetNumber(tid);
+               *off = offset;
 
                /* Use the block number as the key */
-               return (int64) ItemPointerGetBlockNumber(tid);
+               return (int64) block;
        }
 
-       tid_i = ItemPointerGetOffsetNumber(tid);
-       tid_i |= (uint64) ItemPointerGetBlockNumber(tid) << 
ts->control->offset_nbits;
+       tid_i = offset | ((uint64) block << ts->control->offset_nbits);
 
        *off = tid_i & ((UINT64CONST(1) << TIDSTORE_VALUE_NBITS) - 1);
        key = tid_i >> TIDSTORE_VALUE_NBITS;
-- 
2.39.1

From c0bc497f50318c8e31ccdf0c2a9186ffc736abeb Mon Sep 17 00:00:00 2001
From: John Naylor <john.nay...@postgresql.org>
Date: Fri, 10 Feb 2023 19:56:01 +0700
Subject: [PATCH v2699 1/3] Miscellaneous optimizations for tidstore_add_tids()

- remove palloc0; it's expensive for lengths not known at compile-time
- optimize for case with only one key per heap block
- make some intializations const and branch-free
- when writing to the radix tree, stop at the last non-zero bitmap
---
 src/backend/access/common/tidstore.c | 56 ++++++++++++++++++----------
 1 file changed, 36 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/common/tidstore.c 
b/src/backend/access/common/tidstore.c
index 4c72673ce9..5d24680737 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -368,51 +368,67 @@ tidstore_add_tids(TidStore *ts, BlockNumber blkno, 
OffsetNumber *offsets,
                                  int num_offsets)
 {
        ItemPointerData tid;
-       uint64  key_base;
        uint64  *values;
-       int     nkeys;
+       uint64  key;
+       uint64  prev_key;
+       uint64  off_bitmap = 0;
+       int idx;
+       const uint64 key_base = ((uint64) blkno) << 
ts->control->offset_key_nbits;
+       const int nkeys = UINT64CONST(1) << ts->control->offset_key_nbits;
 
        Assert(!TidStoreIsShared(ts) || ts->control->magic == TIDSTORE_MAGIC);
 
-       if (ts->control->encode_tids)
-       {
-               key_base = ((uint64) blkno) << ts->control->offset_key_nbits;
-               nkeys = UINT64CONST(1) << ts->control->offset_key_nbits;
-       }
-       else
-       {
-               key_base = (uint64) blkno;
-               nkeys = 1;
-       }
-       values = palloc0(sizeof(uint64) * nkeys);
+       values = palloc(sizeof(uint64) * nkeys);
+       key = prev_key = key_base;
 
        ItemPointerSetBlockNumber(&tid, blkno);
+
        for (int i = 0; i < num_offsets; i++)
        {
-               uint64  key;
                uint32  off;
-               int idx;
 
                ItemPointerSetOffsetNumber(&tid, offsets[i]);
 
                /* encode the tid to key and val */
                key = tid_to_key_off(ts, &tid, &off);
 
-               idx = key - key_base;
-               Assert(idx >= 0 && idx < nkeys);
+               /* make sure we scanned the line pointer array in order */
+               Assert(key >= prev_key);
 
-               values[idx] |= UINT64CONST(1) << off;
+               if (key > prev_key)
+               {
+                       idx = prev_key - key_base;
+                       Assert(idx >= 0 && idx < nkeys);
+
+                       /* write out offset bitmap for this key */
+                       values[idx] = off_bitmap;
+
+                       /* zero out any gaps up to the current key */
+                       for (int empty_idx = idx + 1; empty_idx < key - 
key_base; empty_idx++)
+                               values[empty_idx] = 0;
+
+                       /* reset for current key -- the current offset will be 
handled below */
+                       off_bitmap = 0;
+                       prev_key = key;
+               }
+
+               off_bitmap |= UINT64CONST(1) << off;
        }
 
+       /* save the final index for later */
+       idx = key - key_base;
+       /* write out last offset bitmap */
+       values[idx] = off_bitmap;
+
        if (TidStoreIsShared(ts))
                LWLockAcquire(&ts->control->lock, LW_EXCLUSIVE);
 
        /* insert the calculated key-values to the tree */
-       for (int i = 0; i < nkeys; i++)
+       for (int i = 0; i <= idx; i++)
        {
                if (values[i])
                {
-                       uint64 key = key_base + i;
+                       key = key_base + i;
 
                        if (TidStoreIsShared(ts))
                                shared_rt_set(ts->tree.shared, key, &values[i]);
-- 
2.39.1

From 82c1f639aaa64cc943af3b53294a63d5d8f7a9b9 Mon Sep 17 00:00:00 2001
From: John Naylor <john.nay...@postgresql.org>
Date: Sat, 11 Feb 2023 11:51:32 +0700
Subject: [PATCH v2699 3/3] Force all callers to encode, no matter how small
 the expected offset

---
 src/backend/access/common/tidstore.c | 36 +++++-----------------------
 1 file changed, 6 insertions(+), 30 deletions(-)

diff --git a/src/backend/access/common/tidstore.c 
b/src/backend/access/common/tidstore.c
index 3d384cf645..ff8e66936e 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -99,7 +99,6 @@ typedef struct TidStoreControl
        size_t  max_bytes;              /* the maximum bytes a TidStore can use 
*/
        int             max_offset;             /* the maximum offset number */
        int             offset_nbits;   /* the number of bits required for 
max_offset */
-       bool    encode_tids;    /* do we use tid encoding? */
        int             offset_key_nbits;       /* the number of bits of a 
offset number
                                                                 * used for the 
key */
 
@@ -224,21 +223,15 @@ tidstore_create(size_t max_bytes, int max_offset, 
dsa_area *area)
        ts->control->max_offset = max_offset;
        ts->control->offset_nbits = pg_ceil_log2_32(max_offset);
 
+       if (ts->control->offset_nbits < TIDSTORE_VALUE_NBITS)
+               ts->control->offset_nbits = TIDSTORE_VALUE_NBITS;
+
        /*
         * We use tid encoding if the number of bits for the offset number 
doesn't
         * fix in a value, uint64.
         */
-       if (ts->control->offset_nbits > TIDSTORE_VALUE_NBITS)
-       {
-               ts->control->encode_tids = true;
-               ts->control->offset_key_nbits =
-                       ts->control->offset_nbits - TIDSTORE_VALUE_NBITS;
-       }
-       else
-       {
-               ts->control->encode_tids = false;
-               ts->control->offset_key_nbits = 0;
-       }
+       ts->control->offset_key_nbits =
+               ts->control->offset_nbits - TIDSTORE_VALUE_NBITS;
 
        return ts;
 }
@@ -643,12 +636,6 @@ tidstore_iter_extract_tids(TidStoreIter *iter, uint64 key, 
uint64 val)
                uint64  tid_i;
                OffsetNumber    off;
 
-               if (i > iter->ts->control->max_offset)
-               {
-                       Assert(!iter->ts->control->encode_tids);
-                       break;
-               }
-
                if ((val & (UINT64CONST(1) << i)) == 0)
                        continue;
 
@@ -668,10 +655,7 @@ tidstore_iter_extract_tids(TidStoreIter *iter, uint64 key, 
uint64 val)
 static inline BlockNumber
 key_get_blkno(TidStore *ts, uint64 key)
 {
-       if (ts->control->encode_tids)
-               return (BlockNumber) (key >> ts->control->offset_key_nbits);
-
-       return (BlockNumber) key;
+       return (BlockNumber) (key >> ts->control->offset_key_nbits);
 }
 
 /* Encode a tid to key and offset */
@@ -691,14 +675,6 @@ encode_key_off(TidStore *ts, BlockNumber block, uint32 
offset, uint32 *off)
        uint64 key;
        uint64 tid_i;
 
-       if (!ts->control->encode_tids)
-       {
-               *off = offset;
-
-               /* Use the block number as the key */
-               return (int64) block;
-       }
-
        tid_i = offset | ((uint64) block << ts->control->offset_nbits);
 
        *off = tid_i & ((UINT64CONST(1) << TIDSTORE_VALUE_NBITS) - 1);
-- 
2.39.1

Reply via email to