On Tue, Feb 14, 2023 at 8:24 PM John Naylor
<[email protected]> wrote:
>
> On Mon, Feb 13, 2023 at 2:51 PM Masahiko Sawada <[email protected]> wrote:
> >
> > On Sat, Feb 11, 2023 at 2:33 PM John Naylor
> > <[email protected]> wrote:
> > >
> > > I didn't get any closer to radix-tree regression,
> >
> > Me neither. It seems that in v26, inserting chunks into node-32 is
> > slow but needs more analysis. I'll share if I found something
> > interesting.
>
> If that were the case, then the other benchmarks I ran would likely have
> slowed down as well, but they are the same or faster. There is one
> microbenchmark I didn't run before: "select * from
> bench_fixed_height_search(15)" (15 to reduce noise from growing size class,
> and despite the name it measures load time as well). Trying this now shows no
> difference: a few runs range 19 to 21ms in each version. That also reinforces
> that update_inner is fine and that the move to value pointer API didn't
> regress.
>
> Changing TIDS_PER_BLOCK_FOR_LOAD to 1 to stress the tree more gives (min of
> 5, perf run separate from measurements):
>
> v15 + v26 store:
>
> mem_allocated | load_ms
> ---------------+---------
> 98202152 | 553
>
> 19.71% postgres postgres [.] tidstore_add_tids
> + 31.47% postgres postgres [.] rt_set
> = 51.18%
>
> 20.62% postgres postgres [.] rt_node_insert_leaf
> 6.05% postgres postgres [.] AllocSetAlloc
> 4.74% postgres postgres [.] AllocSetFree
> 4.62% postgres postgres [.] palloc
> 2.23% postgres postgres [.] SlabAlloc
>
> v26:
>
> mem_allocated | load_ms
> ---------------+---------
> 98202032 | 617
>
> 57.45% postgres postgres [.] tidstore_add_tids
>
> 20.67% postgres postgres [.] local_rt_node_insert_leaf
> 5.99% postgres postgres [.] AllocSetAlloc
> 3.55% postgres postgres [.] palloc
> 3.05% postgres postgres [.] AllocSetFree
> 2.05% postgres postgres [.] SlabAlloc
>
> So it seems the store itself got faster when we removed shared memory paths
> from the v26 store to test it against v15.
>
> I thought to favor the local memory case in the tidstore by controlling
> inlining -- it's smaller and will be called much more often, so I tried the
> following (done in 0007)
>
> #define RT_PREFIX shared_rt
> #define RT_SHMEM
> -#define RT_SCOPE static
> +#define RT_SCOPE static pg_noinline
>
> That brings it down to
>
> mem_allocated | load_ms
> ---------------+---------
> 98202032 | 590
The improvement makes sense to me. I've also done the same test (with
changing TIDS_PER_BLOCK_FOR_LOAD to 1):
w/o 0007 patch:
mem_allocated | load_ms | iter_ms
---------------+---------+---------
98202032 | 334 | 445
(1 row)
w/ 0007 patch:
mem_allocated | load_ms | iter_ms
---------------+---------+---------
98202032 | 316 | 434
(1 row)
On the other hand, with TIDS_PER_BLOCK_FOR_LOAD being 30, the load
performance didn't improve:
w/0 0007 patch:
mem_allocated | load_ms | iter_ms
---------------+---------+---------
98202032 | 601 | 608
(1 row)
w/ 0007 patch:
mem_allocated | load_ms | iter_ms
---------------+---------+---------
98202032 | 610 | 606
(1 row)
That being said, it might be within noise level, so I agree with 0007 patch.
> Perhaps some slowdown is unavoidable, but it would be nice to understand why.
True.
>
> > I can think that something like traversing a HOT chain could visit
> > offsets out of order. But fortunately we prune such collected TIDs
> > before heap vacuum in heap case.
>
> Further, currently we *already* assume we populate the tid array in order
> (for binary search), so we can just continue assuming that (with an assert
> added since it's more public in this form). I'm not sure why such basic
> common sense evaded me a few versions ago...
Right. TidStore is implemented not only for heap, so loading
out-of-order TIDs might be important in the future.
> > > If these are acceptable, I can incorporate them into a later patchset.
> >
> > These are nice improvements! I agree with all changes.
>
> Great, I've squashed these into the tidstore patch (0004). Also added 0005,
> which is just a simplification.
>
I've attached some small patches to improve the radix tree and tidstrore:
We have the following WIP comment in test_radixtree:
// WIP: compiles with warnings because rt_attach is defined but not used
// #define RT_SHMEM
How about unsetting RT_SCOPE to suppress warnings for unused rt_attach
and friends?
FYI I've briefly tested the TidStore with blocksize = 32kb, and it
seems to work fine.
> I squashed the earlier dead code removal into the radix tree patch.
Thanks!
>
> v27-0008 measures tid store iteration performance and adds a stub function to
> prevent spurious warnings, so the benchmarking module can always be built.
>
> Getting the list of offsets from the old array for a given block is always
> trivial, but tidstore_iter_extract_tids() is doing a huge amount of
> unnecessary work when TIDS_PER_BLOCK_FOR_LOAD is 1, enough to exceed the load
> time:
>
> mem_allocated | load_ms | iter_ms
> ---------------+---------+---------
> 98202032 | 589 | 915
>
> Fortunately, it's an easy fix, done in 0009.
>
> mem_allocated | load_ms | iter_ms
> ---------------+---------+---------
> 98202032 | 589 | 153
Cool!
>
> I'll soon resume more cosmetic review of the tid store, but this is enough to
> post.
Thanks!
You removed the vacuum integration patch from v27, is there any reason for that?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From f06557689f33d9b11be1083362fcce19665b4014 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <[email protected]>
Date: Thu, 16 Feb 2023 12:18:22 +0900
Subject: [PATCH v2899 2/2] Small improvements for radixtree and tests.
---
src/include/lib/radixtree.h | 2 +-
src/test/modules/test_radixtree/test_radixtree.c | 13 ++++++++++---
2 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h
index 1cdb995e54..e546bd705c 100644
--- a/src/include/lib/radixtree.h
+++ b/src/include/lib/radixtree.h
@@ -1622,7 +1622,7 @@ RT_SET(RT_RADIX_TREE *tree, uint64 key, RT_VALUE_TYPE
*value_p)
/* Descend the tree until we reach a leaf node */
while (shift >= 0)
{
- RT_PTR_ALLOC new_child = RT_INVALID_PTR_ALLOC;;
+ RT_PTR_ALLOC new_child = RT_INVALID_PTR_ALLOC;
child = RT_PTR_GET_LOCAL(tree, stored_child);
diff --git a/src/test/modules/test_radixtree/test_radixtree.c
b/src/test/modules/test_radixtree/test_radixtree.c
index f944945db9..afe53382f3 100644
--- a/src/test/modules/test_radixtree/test_radixtree.c
+++ b/src/test/modules/test_radixtree/test_radixtree.c
@@ -107,13 +107,12 @@ static const test_spec test_specs[] = {
/* define the radix tree implementation to test */
#define RT_PREFIX rt
-#define RT_SCOPE static
+#define RT_SCOPE
#define RT_DECLARE
#define RT_DEFINE
#define RT_USE_DELETE
#define RT_VALUE_TYPE TestValueType
-// WIP: compiles with warnings because rt_attach is defined but not used
-// #define RT_SHMEM
+/* #define RT_SHMEM */
#include "lib/radixtree.h"
@@ -142,6 +141,8 @@ test_empty(void)
#ifdef RT_SHMEM
int tranche_id = LWLockNewTrancheId();
dsa_area *dsa;
+
+ LWLockRegisterTranche(tranche_id, "test_radix_tree");
dsa = dsa_create(tranche_id);
radixtree = rt_create(CurrentMemoryContext, dsa, tranche_id);
@@ -188,6 +189,8 @@ test_basic(int children, bool test_inner)
#ifdef RT_SHMEM
int tranche_id = LWLockNewTrancheId();
dsa_area *dsa;
+
+ LWLockRegisterTranche(tranche_id, "test_radix_tree");
dsa = dsa_create(tranche_id);
#endif
@@ -358,6 +361,8 @@ test_node_types(uint8 shift)
#ifdef RT_SHMEM
int tranche_id = LWLockNewTrancheId();
dsa_area *dsa;
+
+ LWLockRegisterTranche(tranche_id, "test_radix_tree");
dsa = dsa_create(tranche_id);
#endif
@@ -406,6 +411,8 @@ test_pattern(const test_spec * spec)
#ifdef RT_SHMEM
int tranche_id = LWLockNewTrancheId();
dsa_area *dsa;
+
+ LWLockRegisterTranche(tranche_id, "test_radix_tree");
dsa = dsa_create(tranche_id);
#endif
--
2.31.1
From f6ed6e18b2281cee96af98a39bdfc453117e6a21 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <[email protected]>
Date: Thu, 16 Feb 2023 12:17:59 +0900
Subject: [PATCH v2899 1/2] comment update and test the shared tidstore.
---
src/backend/access/common/tidstore.c | 19 +++-------
.../modules/test_tidstore/test_tidstore.c | 37 +++++++++++++++++--
2 files changed, 40 insertions(+), 16 deletions(-)
diff --git a/src/backend/access/common/tidstore.c
b/src/backend/access/common/tidstore.c
index 015e3dea81..8c05e60d92 100644
--- a/src/backend/access/common/tidstore.c
+++ b/src/backend/access/common/tidstore.c
@@ -64,13 +64,9 @@
* |---------------------------------------------| key
*
* The maximum height of the radix tree is 5 in this case.
- *
- * If the number of bits for offset number fits in a 64-bit value, we don't
- * encode tids but directly use the block number and the offset number as key
- * and value, respectively.
*/
#define TIDSTORE_VALUE_NBITS 6 /* log(64, 2) */
-#define TIDSTORE_OFFSET_MASK ((1 << TIDSTORE_VALUE_NBITS) - 1)
+#define TIDSTORE_OFFSET_MASK ((1 << TIDSTORE_VALUE_NBITS) - 1)
/* A magic value used to identify our TidStores. */
#define TIDSTORE_MAGIC 0x826f6a10
@@ -99,9 +95,10 @@ typedef struct TidStoreControl
/* These values are never changed after creation */
size_t max_bytes; /* the maximum bytes a TidStore can use
*/
int max_offset; /* the maximum offset number */
- int offset_nbits; /* the number of bits required for
max_offset */
- int offset_key_nbits; /* the number of bits of a
offset number
- * used for the
key */
+ int offset_nbits; /* the number of bits required for an
offset
+ * number */
+ int offset_key_nbits; /* the number of bits of an
offset number
+ * used in a
key */
/* The below fields are used only in shared case */
@@ -227,10 +224,6 @@ tidstore_create(size_t max_bytes, int max_offset, dsa_area
*area)
if (ts->control->offset_nbits < TIDSTORE_VALUE_NBITS)
ts->control->offset_nbits = TIDSTORE_VALUE_NBITS;
- /*
- * We use tid encoding if the number of bits for the offset number
doesn't
- * fix in a value, uint64.
- */
ts->control->offset_key_nbits =
ts->control->offset_nbits - TIDSTORE_VALUE_NBITS;
@@ -379,7 +372,7 @@ tidstore_add_tids(TidStore *ts, BlockNumber blkno,
OffsetNumber *offsets,
{
uint64 off_bit;
- /* encode the tid to key and val */
+ /* encode the tid to a key and partial offset */
key = encode_key_off(ts, blkno, offsets[i], &off_bit);
/* make sure we scanned the line pointer array in order */
diff --git a/src/test/modules/test_tidstore/test_tidstore.c
b/src/test/modules/test_tidstore/test_tidstore.c
index 9b849ae8e8..9a1217f833 100644
--- a/src/test/modules/test_tidstore/test_tidstore.c
+++ b/src/test/modules/test_tidstore/test_tidstore.c
@@ -18,10 +18,13 @@
#include "miscadmin.h"
#include "storage/block.h"
#include "storage/itemptr.h"
+#include "storage/lwlock.h"
#include "utils/memutils.h"
PG_MODULE_MAGIC;
+/* #define TEST_SHARED_TIDSTORE 1 */
+
#define TEST_TIDSTORE_MAX_BYTES (2 * 1024 * 1024L) /* 2MB */
PG_FUNCTION_INFO_V1(test_tidstore);
@@ -59,6 +62,18 @@ test_basic(int max_offset)
OffsetNumber offs[TEST_TIDSTORE_NUM_OFFSETS];
int blk_idx;
+#ifdef TEST_SHARED_TIDSTORE
+ int tranche_id = LWLockNewTrancheId();
+ dsa_area *dsa;
+
+ LWLockRegisterTranche(tranche_id, "test_tidstore");
+ dsa = dsa_create(tranche_id);
+
+ ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, max_offset, dsa);
+#else
+ ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, max_offset, NULL);
+#endif
+
/* prepare the offset array */
offs[0] = FirstOffsetNumber;
offs[1] = FirstOffsetNumber + 1;
@@ -66,8 +81,6 @@ test_basic(int max_offset)
offs[3] = max_offset - 1;
offs[4] = max_offset;
- ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, max_offset, NULL);
-
/* add tids */
for (int i = 0; i < TEST_TIDSTORE_NUM_BLOCKS; i++)
tidstore_add_tids(ts, blks[i], offs, TEST_TIDSTORE_NUM_OFFSETS);
@@ -144,6 +157,10 @@ test_basic(int max_offset)
}
tidstore_destroy(ts);
+
+#ifdef TEST_SHARED_TIDSTORE
+ dsa_detach(dsa);
+#endif
}
static void
@@ -153,9 +170,19 @@ test_empty(void)
TidStoreIter *iter;
ItemPointerData tid;
- elog(NOTICE, "testing empty tidstore");
+#ifdef TEST_SHARED_TIDSTORE
+ int tranche_id = LWLockNewTrancheId();
+ dsa_area *dsa;
+
+ LWLockRegisterTranche(tranche_id, "test_tidstore");
+ dsa = dsa_create(tranche_id);
+ ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, MaxHeapTuplesPerPage,
dsa);
+#else
ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, MaxHeapTuplesPerPage,
NULL);
+#endif
+
+ elog(NOTICE, "testing empty tidstore");
ItemPointerSet(&tid, 0, FirstOffsetNumber);
if (tidstore_lookup_tid(ts, &tid))
@@ -180,6 +207,10 @@ test_empty(void)
tidstore_end_iterate(iter);
tidstore_destroy(ts);
+
+#ifdef TEST_SHARED_TIDSTORE
+ dsa_detach(dsa);
+#endif
}
Datum
--
2.31.1