On Tue, Feb 14, 2023 at 8:24 PM John Naylor
<john.nay...@enterprisedb.com> wrote:
>
> On Mon, Feb 13, 2023 at 2:51 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
> >
> > On Sat, Feb 11, 2023 at 2:33 PM John Naylor
> > <john.nay...@enterprisedb.com> wrote:
> > >
> > > I didn't get any closer to radix-tree regression,
> >
> > Me neither. It seems that in v26, inserting chunks into node-32 is
> > slow but needs more analysis. I'll share if I found something
> > interesting.
>
> If that were the case, then the other benchmarks I ran would likely have 
> slowed down as well, but they are the same or faster. There is one 
> microbenchmark I didn't run before: "select * from 
> bench_fixed_height_search(15)" (15 to reduce noise from growing size class, 
> and despite the name it measures load time as well). Trying this now shows no 
> difference: a few runs range 19 to 21ms in each version. That also reinforces 
> that update_inner is fine and that the move to value pointer API didn't 
> regress.
>
> Changing TIDS_PER_BLOCK_FOR_LOAD to 1 to stress the tree more gives (min of 
> 5, perf run separate from measurements):
>
> v15 + v26 store:
>
>  mem_allocated | load_ms
> ---------------+---------
>       98202152 |     553
>
>   19.71%  postgres  postgres             [.] tidstore_add_tids
> + 31.47%  postgres  postgres             [.] rt_set
> = 51.18%
>
>   20.62%  postgres  postgres             [.] rt_node_insert_leaf
>    6.05%  postgres  postgres             [.] AllocSetAlloc
>    4.74%  postgres  postgres             [.] AllocSetFree
>    4.62%  postgres  postgres             [.] palloc
>    2.23%  postgres  postgres             [.] SlabAlloc
>
> v26:
>
>  mem_allocated | load_ms
> ---------------+---------
>       98202032 |     617
>
>   57.45%  postgres  postgres             [.] tidstore_add_tids
>
>   20.67%  postgres  postgres             [.] local_rt_node_insert_leaf
>    5.99%  postgres  postgres             [.] AllocSetAlloc
>    3.55%  postgres  postgres             [.] palloc
>    3.05%  postgres  postgres             [.] AllocSetFree
>    2.05%  postgres  postgres             [.] SlabAlloc
>
> So it seems the store itself got faster when we removed shared memory paths 
> from the v26 store to test it against v15.
>
> I thought to favor the local memory case in the tidstore by controlling 
> inlining -- it's smaller and will be called much more often, so I tried the 
> following (done in 0007)
>
>  #define RT_PREFIX shared_rt
>  #define RT_SHMEM
> -#define RT_SCOPE static
> +#define RT_SCOPE static pg_noinline
>
> That brings it down to
>
>  mem_allocated | load_ms
> ---------------+---------
>       98202032 |     590

The improvement makes sense to me. I've also done the same test (with
changing TIDS_PER_BLOCK_FOR_LOAD to 1):

w/o 0007 patch:
 mem_allocated | load_ms | iter_ms
---------------+---------+---------
      98202032 |     334 |     445
(1 row)

w/ 0007 patch:
 mem_allocated | load_ms | iter_ms
---------------+---------+---------
      98202032 |     316 |     434
(1 row)

On the other hand, with TIDS_PER_BLOCK_FOR_LOAD being 30, the load
performance didn't improve:

w/0 0007 patch:
 mem_allocated | load_ms | iter_ms
---------------+---------+---------
      98202032 |     601 |     608
(1 row)

w/ 0007 patch:
 mem_allocated | load_ms | iter_ms
---------------+---------+---------
      98202032 |     610 |     606
(1 row)

That being said, it might be within noise level, so I agree with 0007 patch.

> Perhaps some slowdown is unavoidable, but it would be nice to understand why.

True.

>
> > I can think that something like traversing a HOT chain could visit
> > offsets out of order. But fortunately we prune such collected TIDs
> > before heap vacuum in heap case.
>
> Further, currently we *already* assume we populate the tid array in order 
> (for binary search), so we can just continue assuming that (with an assert 
> added since it's more public in this form). I'm not sure why such basic 
> common sense evaded me a few versions ago...

Right. TidStore is implemented not only for heap, so loading
out-of-order TIDs might be important in the future.

> > > If these are acceptable, I can incorporate them into a later patchset.
> >
> > These are nice improvements! I agree with all changes.
>
> Great, I've squashed these into the tidstore patch (0004). Also added 0005, 
> which is just a simplification.
>

I've attached some small patches to improve the radix tree and tidstrore:

We have the following WIP comment in test_radixtree:

// WIP: compiles with warnings because rt_attach is defined but not used
// #define RT_SHMEM

How about unsetting RT_SCOPE to suppress warnings for unused rt_attach
and friends?

FYI I've briefly tested the TidStore with blocksize = 32kb, and it
seems to work fine.

> I squashed the earlier dead code removal into the radix tree patch.

Thanks!

>
> v27-0008 measures tid store iteration performance and adds a stub function to 
> prevent spurious warnings, so the benchmarking module can always be built.
>
> Getting the list of offsets from the old array for a given block is always 
> trivial, but tidstore_iter_extract_tids() is doing a huge amount of 
> unnecessary work when TIDS_PER_BLOCK_FOR_LOAD is 1, enough to exceed the load 
> time:
>
>  mem_allocated | load_ms | iter_ms
> ---------------+---------+---------
>       98202032 |     589 |     915
>
> Fortunately, it's an easy fix, done in 0009.
>
>  mem_allocated | load_ms | iter_ms
> ---------------+---------+---------
>       98202032 |     589 |     153

Cool!

>
> I'll soon resume more cosmetic review of the tid store, but this is enough to 
> post.

Thanks!

You removed the vacuum integration patch from v27, is there any reason for that?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From f06557689f33d9b11be1083362fcce19665b4014 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.m...@gmail.com>
Date: Thu, 16 Feb 2023 12:18:22 +0900
Subject: [PATCH v2899 2/2] Small improvements for radixtree and tests.

---
 src/include/lib/radixtree.h                      |  2 +-
 src/test/modules/test_radixtree/test_radixtree.c | 13 ++++++++++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index 1cdb995e54..e546bd705c 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -1622,7 +1622,7 @@ RT_SET(RT_RADIX_TREE *tree, uint64 key, RT_VALUE_TYPE 
*value_p)
        /* Descend the tree until we reach a leaf node */
        while (shift >= 0)
        {
-               RT_PTR_ALLOC new_child = RT_INVALID_PTR_ALLOC;;
+               RT_PTR_ALLOC new_child = RT_INVALID_PTR_ALLOC;
 
                child = RT_PTR_GET_LOCAL(tree, stored_child);
 
diff --git a/src/test/modules/test_radixtree/test_radixtree.c 
b/src/test/modules/test_radixtree/test_radixtree.c
index f944945db9..afe53382f3 100644
--- a/src/test/modules/test_radixtree/test_radixtree.c
+++ b/src/test/modules/test_radixtree/test_radixtree.c
@@ -107,13 +107,12 @@ static const test_spec test_specs[] = {
 
 /* define the radix tree implementation to test */
 #define RT_PREFIX rt
-#define RT_SCOPE static
+#define RT_SCOPE
 #define RT_DECLARE
 #define RT_DEFINE
 #define RT_USE_DELETE
 #define RT_VALUE_TYPE TestValueType
-// WIP: compiles with warnings because rt_attach is defined but not used
-// #define RT_SHMEM
+/* #define RT_SHMEM */
 #include "lib/radixtree.h"
 
 
@@ -142,6 +141,8 @@ test_empty(void)
 #ifdef RT_SHMEM
        int                     tranche_id = LWLockNewTrancheId();
        dsa_area   *dsa;
+
+       LWLockRegisterTranche(tranche_id, "test_radix_tree");
        dsa = dsa_create(tranche_id);
 
        radixtree = rt_create(CurrentMemoryContext, dsa, tranche_id);
@@ -188,6 +189,8 @@ test_basic(int children, bool test_inner)
 #ifdef RT_SHMEM
        int                     tranche_id = LWLockNewTrancheId();
        dsa_area   *dsa;
+
+       LWLockRegisterTranche(tranche_id, "test_radix_tree");
        dsa = dsa_create(tranche_id);
 #endif
 
@@ -358,6 +361,8 @@ test_node_types(uint8 shift)
 #ifdef RT_SHMEM
        int                     tranche_id = LWLockNewTrancheId();
        dsa_area   *dsa;
+
+       LWLockRegisterTranche(tranche_id, "test_radix_tree");
        dsa = dsa_create(tranche_id);
 #endif
 
@@ -406,6 +411,8 @@ test_pattern(const test_spec * spec)
 #ifdef RT_SHMEM
        int                     tranche_id = LWLockNewTrancheId();
        dsa_area   *dsa;
+
+       LWLockRegisterTranche(tranche_id, "test_radix_tree");
        dsa = dsa_create(tranche_id);
 #endif
 
-- 
2.31.1

From f6ed6e18b2281cee96af98a39bdfc453117e6a21 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.m...@gmail.com>
Date: Thu, 16 Feb 2023 12:17:59 +0900
Subject: [PATCH v2899 1/2] comment update and test the shared tidstore.

---
 src/backend/access/common/tidstore.c          | 19 +++-------
 .../modules/test_tidstore/test_tidstore.c     | 37 +++++++++++++++++--
 2 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/common/tidstore.c 
b/src/backend/access/common/tidstore.c
index 015e3dea81..8c05e60d92 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -64,13 +64,9 @@
  * |---------------------------------------------| key
  *
  * The maximum height of the radix tree is 5 in this case.
- *
- * If the number of bits for offset number fits in a 64-bit value, we don't
- * encode tids but directly use the block number and the offset number as key
- * and value, respectively.
  */
 #define TIDSTORE_VALUE_NBITS   6       /* log(64, 2) */
-#define TIDSTORE_OFFSET_MASK ((1 << TIDSTORE_VALUE_NBITS) - 1)
+#define TIDSTORE_OFFSET_MASK   ((1 << TIDSTORE_VALUE_NBITS) - 1)
 
 /* A magic value used to identify our TidStores. */
 #define TIDSTORE_MAGIC 0x826f6a10
@@ -99,9 +95,10 @@ typedef struct TidStoreControl
        /* These values are never changed after creation */
        size_t  max_bytes;              /* the maximum bytes a TidStore can use 
*/
        int             max_offset;             /* the maximum offset number */
-       int             offset_nbits;   /* the number of bits required for 
max_offset */
-       int             offset_key_nbits;       /* the number of bits of a 
offset number
-                                                                * used for the 
key */
+       int             offset_nbits;   /* the number of bits required for an 
offset
+                                                        * number */
+       int             offset_key_nbits;       /* the number of bits of an 
offset number
+                                                                * used in a 
key */
 
        /* The below fields are used only in shared case */
 
@@ -227,10 +224,6 @@ tidstore_create(size_t max_bytes, int max_offset, dsa_area 
*area)
        if (ts->control->offset_nbits < TIDSTORE_VALUE_NBITS)
                ts->control->offset_nbits = TIDSTORE_VALUE_NBITS;
 
-       /*
-        * We use tid encoding if the number of bits for the offset number 
doesn't
-        * fix in a value, uint64.
-        */
        ts->control->offset_key_nbits =
                ts->control->offset_nbits - TIDSTORE_VALUE_NBITS;
 
@@ -379,7 +372,7 @@ tidstore_add_tids(TidStore *ts, BlockNumber blkno, 
OffsetNumber *offsets,
        {
                uint64  off_bit;
 
-               /* encode the tid to key and val */
+               /* encode the tid to a key and partial offset */
                key = encode_key_off(ts, blkno, offsets[i], &off_bit);
 
                /* make sure we scanned the line pointer array in order */
diff --git a/src/test/modules/test_tidstore/test_tidstore.c 
b/src/test/modules/test_tidstore/test_tidstore.c
index 9b849ae8e8..9a1217f833 100644
--- a/src/test/modules/test_tidstore/test_tidstore.c
+++ b/src/test/modules/test_tidstore/test_tidstore.c
@@ -18,10 +18,13 @@
 #include "miscadmin.h"
 #include "storage/block.h"
 #include "storage/itemptr.h"
+#include "storage/lwlock.h"
 #include "utils/memutils.h"
 
 PG_MODULE_MAGIC;
 
+/* #define TEST_SHARED_TIDSTORE 1 */
+
 #define TEST_TIDSTORE_MAX_BYTES (2 * 1024 * 1024L) /* 2MB */
 
 PG_FUNCTION_INFO_V1(test_tidstore);
@@ -59,6 +62,18 @@ test_basic(int max_offset)
        OffsetNumber offs[TEST_TIDSTORE_NUM_OFFSETS];
        int blk_idx;
 
+#ifdef TEST_SHARED_TIDSTORE
+       int tranche_id = LWLockNewTrancheId();
+       dsa_area *dsa;
+
+       LWLockRegisterTranche(tranche_id, "test_tidstore");
+       dsa = dsa_create(tranche_id);
+
+       ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, max_offset, dsa);
+#else
+       ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, max_offset, NULL);
+#endif
+
        /* prepare the offset array */
        offs[0] = FirstOffsetNumber;
        offs[1] = FirstOffsetNumber + 1;
@@ -66,8 +81,6 @@ test_basic(int max_offset)
        offs[3] = max_offset - 1;
        offs[4] = max_offset;
 
-       ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, max_offset, NULL);
-
        /* add tids */
        for (int i = 0; i < TEST_TIDSTORE_NUM_BLOCKS; i++)
                tidstore_add_tids(ts, blks[i], offs, TEST_TIDSTORE_NUM_OFFSETS);
@@ -144,6 +157,10 @@ test_basic(int max_offset)
        }
 
        tidstore_destroy(ts);
+
+#ifdef TEST_SHARED_TIDSTORE
+       dsa_detach(dsa);
+#endif
 }
 
 static void
@@ -153,9 +170,19 @@ test_empty(void)
        TidStoreIter *iter;
        ItemPointerData tid;
 
-       elog(NOTICE, "testing empty tidstore");
+#ifdef TEST_SHARED_TIDSTORE
+       int tranche_id = LWLockNewTrancheId();
+       dsa_area *dsa;
+
+       LWLockRegisterTranche(tranche_id, "test_tidstore");
+       dsa = dsa_create(tranche_id);
 
+       ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, MaxHeapTuplesPerPage, 
dsa);
+#else
        ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, MaxHeapTuplesPerPage, 
NULL);
+#endif
+
+       elog(NOTICE, "testing empty tidstore");
 
        ItemPointerSet(&tid, 0, FirstOffsetNumber);
        if (tidstore_lookup_tid(ts, &tid))
@@ -180,6 +207,10 @@ test_empty(void)
        tidstore_end_iterate(iter);
 
        tidstore_destroy(ts);
+
+#ifdef TEST_SHARED_TIDSTORE
+       dsa_detach(dsa);
+#endif
 }
 
 Datum
-- 
2.31.1

Reply via email to