On Tue, Feb 14, 2023 at 8:24 PM John Naylor <john.nay...@enterprisedb.com> wrote: > > On Mon, Feb 13, 2023 at 2:51 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > > > On Sat, Feb 11, 2023 at 2:33 PM John Naylor > > <john.nay...@enterprisedb.com> wrote: > > > > > > I didn't get any closer to radix-tree regression, > > > > Me neither. It seems that in v26, inserting chunks into node-32 is > > slow but needs more analysis. I'll share if I found something > > interesting. > > If that were the case, then the other benchmarks I ran would likely have > slowed down as well, but they are the same or faster. There is one > microbenchmark I didn't run before: "select * from > bench_fixed_height_search(15)" (15 to reduce noise from growing size class, > and despite the name it measures load time as well). Trying this now shows no > difference: a few runs range 19 to 21ms in each version. That also reinforces > that update_inner is fine and that the move to value pointer API didn't > regress. > > Changing TIDS_PER_BLOCK_FOR_LOAD to 1 to stress the tree more gives (min of > 5, perf run separate from measurements): > > v15 + v26 store: > > mem_allocated | load_ms > ---------------+--------- > 98202152 | 553 > > 19.71% postgres postgres [.] tidstore_add_tids > + 31.47% postgres postgres [.] rt_set > = 51.18% > > 20.62% postgres postgres [.] rt_node_insert_leaf > 6.05% postgres postgres [.] AllocSetAlloc > 4.74% postgres postgres [.] AllocSetFree > 4.62% postgres postgres [.] palloc > 2.23% postgres postgres [.] SlabAlloc > > v26: > > mem_allocated | load_ms > ---------------+--------- > 98202032 | 617 > > 57.45% postgres postgres [.] tidstore_add_tids > > 20.67% postgres postgres [.] local_rt_node_insert_leaf > 5.99% postgres postgres [.] AllocSetAlloc > 3.55% postgres postgres [.] palloc > 3.05% postgres postgres [.] AllocSetFree > 2.05% postgres postgres [.] SlabAlloc > > So it seems the store itself got faster when we removed shared memory paths > from the v26 store to test it against v15. > > I thought to favor the local memory case in the tidstore by controlling > inlining -- it's smaller and will be called much more often, so I tried the > following (done in 0007) > > #define RT_PREFIX shared_rt > #define RT_SHMEM > -#define RT_SCOPE static > +#define RT_SCOPE static pg_noinline > > That brings it down to > > mem_allocated | load_ms > ---------------+--------- > 98202032 | 590
The improvement makes sense to me. I've also done the same test (with changing TIDS_PER_BLOCK_FOR_LOAD to 1): w/o 0007 patch: mem_allocated | load_ms | iter_ms ---------------+---------+--------- 98202032 | 334 | 445 (1 row) w/ 0007 patch: mem_allocated | load_ms | iter_ms ---------------+---------+--------- 98202032 | 316 | 434 (1 row) On the other hand, with TIDS_PER_BLOCK_FOR_LOAD being 30, the load performance didn't improve: w/0 0007 patch: mem_allocated | load_ms | iter_ms ---------------+---------+--------- 98202032 | 601 | 608 (1 row) w/ 0007 patch: mem_allocated | load_ms | iter_ms ---------------+---------+--------- 98202032 | 610 | 606 (1 row) That being said, it might be within noise level, so I agree with 0007 patch. > Perhaps some slowdown is unavoidable, but it would be nice to understand why. True. > > > I can think that something like traversing a HOT chain could visit > > offsets out of order. But fortunately we prune such collected TIDs > > before heap vacuum in heap case. > > Further, currently we *already* assume we populate the tid array in order > (for binary search), so we can just continue assuming that (with an assert > added since it's more public in this form). I'm not sure why such basic > common sense evaded me a few versions ago... Right. TidStore is implemented not only for heap, so loading out-of-order TIDs might be important in the future. > > > If these are acceptable, I can incorporate them into a later patchset. > > > > These are nice improvements! I agree with all changes. > > Great, I've squashed these into the tidstore patch (0004). Also added 0005, > which is just a simplification. > I've attached some small patches to improve the radix tree and tidstrore: We have the following WIP comment in test_radixtree: // WIP: compiles with warnings because rt_attach is defined but not used // #define RT_SHMEM How about unsetting RT_SCOPE to suppress warnings for unused rt_attach and friends? FYI I've briefly tested the TidStore with blocksize = 32kb, and it seems to work fine. > I squashed the earlier dead code removal into the radix tree patch. Thanks! > > v27-0008 measures tid store iteration performance and adds a stub function to > prevent spurious warnings, so the benchmarking module can always be built. > > Getting the list of offsets from the old array for a given block is always > trivial, but tidstore_iter_extract_tids() is doing a huge amount of > unnecessary work when TIDS_PER_BLOCK_FOR_LOAD is 1, enough to exceed the load > time: > > mem_allocated | load_ms | iter_ms > ---------------+---------+--------- > 98202032 | 589 | 915 > > Fortunately, it's an easy fix, done in 0009. > > mem_allocated | load_ms | iter_ms > ---------------+---------+--------- > 98202032 | 589 | 153 Cool! > > I'll soon resume more cosmetic review of the tid store, but this is enough to > post. Thanks! You removed the vacuum integration patch from v27, is there any reason for that? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
From f06557689f33d9b11be1083362fcce19665b4014 Mon Sep 17 00:00:00 2001 From: Masahiko Sawada <sawada.m...@gmail.com> Date: Thu, 16 Feb 2023 12:18:22 +0900 Subject: [PATCH v2899 2/2] Small improvements for radixtree and tests. --- src/include/lib/radixtree.h | 2 +- src/test/modules/test_radixtree/test_radixtree.c | 13 ++++++++++--- 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h index 1cdb995e54..e546bd705c 100644 --- a/src/include/lib/radixtree.h +++ b/src/include/lib/radixtree.h @@ -1622,7 +1622,7 @@ RT_SET(RT_RADIX_TREE *tree, uint64 key, RT_VALUE_TYPE *value_p) /* Descend the tree until we reach a leaf node */ while (shift >= 0) { - RT_PTR_ALLOC new_child = RT_INVALID_PTR_ALLOC;; + RT_PTR_ALLOC new_child = RT_INVALID_PTR_ALLOC; child = RT_PTR_GET_LOCAL(tree, stored_child); diff --git a/src/test/modules/test_radixtree/test_radixtree.c b/src/test/modules/test_radixtree/test_radixtree.c index f944945db9..afe53382f3 100644 --- a/src/test/modules/test_radixtree/test_radixtree.c +++ b/src/test/modules/test_radixtree/test_radixtree.c @@ -107,13 +107,12 @@ static const test_spec test_specs[] = { /* define the radix tree implementation to test */ #define RT_PREFIX rt -#define RT_SCOPE static +#define RT_SCOPE #define RT_DECLARE #define RT_DEFINE #define RT_USE_DELETE #define RT_VALUE_TYPE TestValueType -// WIP: compiles with warnings because rt_attach is defined but not used -// #define RT_SHMEM +/* #define RT_SHMEM */ #include "lib/radixtree.h" @@ -142,6 +141,8 @@ test_empty(void) #ifdef RT_SHMEM int tranche_id = LWLockNewTrancheId(); dsa_area *dsa; + + LWLockRegisterTranche(tranche_id, "test_radix_tree"); dsa = dsa_create(tranche_id); radixtree = rt_create(CurrentMemoryContext, dsa, tranche_id); @@ -188,6 +189,8 @@ test_basic(int children, bool test_inner) #ifdef RT_SHMEM int tranche_id = LWLockNewTrancheId(); dsa_area *dsa; + + LWLockRegisterTranche(tranche_id, "test_radix_tree"); dsa = dsa_create(tranche_id); #endif @@ -358,6 +361,8 @@ test_node_types(uint8 shift) #ifdef RT_SHMEM int tranche_id = LWLockNewTrancheId(); dsa_area *dsa; + + LWLockRegisterTranche(tranche_id, "test_radix_tree"); dsa = dsa_create(tranche_id); #endif @@ -406,6 +411,8 @@ test_pattern(const test_spec * spec) #ifdef RT_SHMEM int tranche_id = LWLockNewTrancheId(); dsa_area *dsa; + + LWLockRegisterTranche(tranche_id, "test_radix_tree"); dsa = dsa_create(tranche_id); #endif -- 2.31.1
From f6ed6e18b2281cee96af98a39bdfc453117e6a21 Mon Sep 17 00:00:00 2001 From: Masahiko Sawada <sawada.m...@gmail.com> Date: Thu, 16 Feb 2023 12:17:59 +0900 Subject: [PATCH v2899 1/2] comment update and test the shared tidstore. --- src/backend/access/common/tidstore.c | 19 +++------- .../modules/test_tidstore/test_tidstore.c | 37 +++++++++++++++++-- 2 files changed, 40 insertions(+), 16 deletions(-) diff --git a/src/backend/access/common/tidstore.c b/src/backend/access/common/tidstore.c index 015e3dea81..8c05e60d92 100644 --- a/src/backend/access/common/tidstore.c +++ b/src/backend/access/common/tidstore.c @@ -64,13 +64,9 @@ * |---------------------------------------------| key * * The maximum height of the radix tree is 5 in this case. - * - * If the number of bits for offset number fits in a 64-bit value, we don't - * encode tids but directly use the block number and the offset number as key - * and value, respectively. */ #define TIDSTORE_VALUE_NBITS 6 /* log(64, 2) */ -#define TIDSTORE_OFFSET_MASK ((1 << TIDSTORE_VALUE_NBITS) - 1) +#define TIDSTORE_OFFSET_MASK ((1 << TIDSTORE_VALUE_NBITS) - 1) /* A magic value used to identify our TidStores. */ #define TIDSTORE_MAGIC 0x826f6a10 @@ -99,9 +95,10 @@ typedef struct TidStoreControl /* These values are never changed after creation */ size_t max_bytes; /* the maximum bytes a TidStore can use */ int max_offset; /* the maximum offset number */ - int offset_nbits; /* the number of bits required for max_offset */ - int offset_key_nbits; /* the number of bits of a offset number - * used for the key */ + int offset_nbits; /* the number of bits required for an offset + * number */ + int offset_key_nbits; /* the number of bits of an offset number + * used in a key */ /* The below fields are used only in shared case */ @@ -227,10 +224,6 @@ tidstore_create(size_t max_bytes, int max_offset, dsa_area *area) if (ts->control->offset_nbits < TIDSTORE_VALUE_NBITS) ts->control->offset_nbits = TIDSTORE_VALUE_NBITS; - /* - * We use tid encoding if the number of bits for the offset number doesn't - * fix in a value, uint64. - */ ts->control->offset_key_nbits = ts->control->offset_nbits - TIDSTORE_VALUE_NBITS; @@ -379,7 +372,7 @@ tidstore_add_tids(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets, { uint64 off_bit; - /* encode the tid to key and val */ + /* encode the tid to a key and partial offset */ key = encode_key_off(ts, blkno, offsets[i], &off_bit); /* make sure we scanned the line pointer array in order */ diff --git a/src/test/modules/test_tidstore/test_tidstore.c b/src/test/modules/test_tidstore/test_tidstore.c index 9b849ae8e8..9a1217f833 100644 --- a/src/test/modules/test_tidstore/test_tidstore.c +++ b/src/test/modules/test_tidstore/test_tidstore.c @@ -18,10 +18,13 @@ #include "miscadmin.h" #include "storage/block.h" #include "storage/itemptr.h" +#include "storage/lwlock.h" #include "utils/memutils.h" PG_MODULE_MAGIC; +/* #define TEST_SHARED_TIDSTORE 1 */ + #define TEST_TIDSTORE_MAX_BYTES (2 * 1024 * 1024L) /* 2MB */ PG_FUNCTION_INFO_V1(test_tidstore); @@ -59,6 +62,18 @@ test_basic(int max_offset) OffsetNumber offs[TEST_TIDSTORE_NUM_OFFSETS]; int blk_idx; +#ifdef TEST_SHARED_TIDSTORE + int tranche_id = LWLockNewTrancheId(); + dsa_area *dsa; + + LWLockRegisterTranche(tranche_id, "test_tidstore"); + dsa = dsa_create(tranche_id); + + ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, max_offset, dsa); +#else + ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, max_offset, NULL); +#endif + /* prepare the offset array */ offs[0] = FirstOffsetNumber; offs[1] = FirstOffsetNumber + 1; @@ -66,8 +81,6 @@ test_basic(int max_offset) offs[3] = max_offset - 1; offs[4] = max_offset; - ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, max_offset, NULL); - /* add tids */ for (int i = 0; i < TEST_TIDSTORE_NUM_BLOCKS; i++) tidstore_add_tids(ts, blks[i], offs, TEST_TIDSTORE_NUM_OFFSETS); @@ -144,6 +157,10 @@ test_basic(int max_offset) } tidstore_destroy(ts); + +#ifdef TEST_SHARED_TIDSTORE + dsa_detach(dsa); +#endif } static void @@ -153,9 +170,19 @@ test_empty(void) TidStoreIter *iter; ItemPointerData tid; - elog(NOTICE, "testing empty tidstore"); +#ifdef TEST_SHARED_TIDSTORE + int tranche_id = LWLockNewTrancheId(); + dsa_area *dsa; + + LWLockRegisterTranche(tranche_id, "test_tidstore"); + dsa = dsa_create(tranche_id); + ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, MaxHeapTuplesPerPage, dsa); +#else ts = tidstore_create(TEST_TIDSTORE_MAX_BYTES, MaxHeapTuplesPerPage, NULL); +#endif + + elog(NOTICE, "testing empty tidstore"); ItemPointerSet(&tid, 0, FirstOffsetNumber); if (tidstore_lookup_tid(ts, &tid)) @@ -180,6 +207,10 @@ test_empty(void) tidstore_end_iterate(iter); tidstore_destroy(ts); + +#ifdef TEST_SHARED_TIDSTORE + dsa_detach(dsa); +#endif } Datum -- 2.31.1