On Fri, Dec 20, 2024 at 4:12 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Wed, Dec 18, 2024 at 10:32 PM John Naylor <johncnaylo...@gmail.com> wrote:
> > 2. The iter_context is separate because the creator's new context > > could be a bump context which doesn't support pfree. But above we > > assume we can pfree in the caller's context. Also, IIUC we only > > allocate small iter objects, and it'd be unusual to need more than one > > at a time per backend, so it's a bit strange to have an entire context > > for that. Since we use a standard pattern of "begin; while(iter); > > end;", it seems unlikely that someone will cause a leak because of a > > coding mistake in iteration. v3-0001 allocates the iter data in the caller's context. It's a small patch, but still a core behavior change so proposed for master-only. I believe your v1 is still the least invasive fix for PG17. > > If these tiny admin structs were always, not sometimes, in the callers > > current context, I think it would be easier to reason about because > > then the creator's passed context would be used only for local memory, > > specifically only for leaves and the inner node child contexts. 0002 does this. > Fair points. Given that we need only one iterator at a time per > backend, it would be simpler if the caller passes the pointer to an > iterator that is a stack variable to RT_BEGIN_ITEREATE(). For example, > TidStoreBeginIterate() would be like: > > if (TidStoreIsShared(ts)) > shared_ts_begin_iterate(ts->tree.shared, &iter->tree_iter.shared); > else > local_ts_begin_iterate(ts->tree.local, &iter->tree_iter.shared); Hard for me to tell if it'd be simpler. > > 3. I was never a fan of trying to second-guess the creator's new > > context and instead use slab for fixed-sized leaf allocations. If the > > creator passes a bump context, we say "no, no, no, use slab -- it's > > good for ya". Let's assume the caller knows what they're doing. > > That's a valid argument but how can a user use the slab context for > leaf allocations? It's trivial after 0001-02: 0003 removes makes one test use slab as the passed context (only 32-bit systems would actually use it currently). Also, with a bit more work we could allow a NULL context for when the caller has purposely arranged to use pointer-sized values. Did you see any of Heikki's CSN patches? There is a radix tree used as a cache in a context where the tree could be created and destroyed frequently. Something about the memory blocks seems to have tickled some bad case in the glibc allocator, and one less context might be good insurance: https://www.postgresql.org/message-id/718d1788-b058-40e6-bc37-8f15612b5646%40iki.fi -- John Naylor Amazon Web Services
From cd5f4e63dd7a49a0884249e55bf2b78293faa4b9 Mon Sep 17 00:00:00 2001 From: John Naylor <john.nay...@postgresql.org> Date: Fri, 20 Dec 2024 12:01:50 +0700 Subject: [PATCH v3 1/3] Use caller's current memory context for radix tree iteration --- src/include/lib/radixtree.h | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h index 1301f3fee4..8fe2302652 100644 --- a/src/include/lib/radixtree.h +++ b/src/include/lib/radixtree.h @@ -719,7 +719,6 @@ struct RT_RADIX_TREE /* leaf_context is used only for single-value leaves */ MemoryContextData *leaf_context; #endif - MemoryContextData *iter_context; }; /* @@ -1836,14 +1835,6 @@ RT_CREATE(MemoryContext ctx) tree = (RT_RADIX_TREE *) palloc0(sizeof(RT_RADIX_TREE)); tree->context = ctx; - /* - * Separate context for iteration in case the tree context doesn't support - * pfree - */ - tree->iter_context = AllocSetContextCreate(ctx, - RT_STR(RT_PREFIX) "_radix_tree iter context", - ALLOCSET_SMALL_SIZES); - #ifdef RT_SHMEM tree->dsa = dsa; dp = dsa_allocate0(dsa, sizeof(RT_RADIX_TREE_CONTROL)); @@ -2075,7 +2066,8 @@ RT_FREE(RT_RADIX_TREE * tree) /***************** ITERATION *****************/ /* - * Create and return the iterator for the given radix tree. + * Create and return the iterator for the given radix tree, + * in the caller's current memory context. * * Taking a lock in shared mode during the iteration is the caller's * responsibility. @@ -2086,8 +2078,7 @@ RT_BEGIN_ITERATE(RT_RADIX_TREE * tree) RT_ITER *iter; RT_CHILD_PTR root; - iter = (RT_ITER *) MemoryContextAllocZero(tree->iter_context, - sizeof(RT_ITER)); + iter = (RT_ITER *) palloc0(sizeof(RT_ITER)); iter->tree = tree; Assert(RT_PTR_ALLOC_IS_VALID(tree->ctl->root)); -- 2.47.1
From 79d22b266f0df433dcb9147d0d64db53c681f6c7 Mon Sep 17 00:00:00 2001 From: John Naylor <john.nay...@postgresql.org> Date: Fri, 20 Dec 2024 14:48:24 +0700 Subject: [PATCH v3 3/3] Always use the caller-provided context for radix tree leaves Previously, RT_CREATE would create an additional slab context if the values were fixed-length and larger than pointer size. Commit XXXXX arranged so that the caller-provided context is only used for leaves and nothing else, so if we override that choice, that context will go completely unused. That's confusing, and we have no reason to believe we know better than the developer, so just use what the caller provided. As demonstration, use slab in one of the regression tests --- src/include/lib/radixtree.h | 14 -------------- src/test/modules/test_radixtree/test_radixtree.c | 5 +++-- 2 files changed, 3 insertions(+), 16 deletions(-) diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h index 32f312305e..aa6354b2a4 100644 --- a/src/include/lib/radixtree.h +++ b/src/include/lib/radixtree.h @@ -1849,21 +1849,7 @@ RT_CREATE(MemoryContext ctx) size_class.allocsize); } - /* By default we use the passed context for leaves. */ tree->leaf_context = ctx; - -#ifndef RT_VARLEN_VALUE_SIZE - - /* - * For leaves storing fixed-length values, we use a slab context to avoid - * the possibility of space wastage by power-of-2 rounding up. - */ - if (sizeof(RT_VALUE_TYPE) > sizeof(RT_PTR_ALLOC)) - tree->leaf_context = SlabContextCreate(ctx, - RT_STR(RT_PREFIX) "_radix_tree leaf context", - RT_SLAB_BLOCK_SIZE(sizeof(RT_VALUE_TYPE)), - sizeof(RT_VALUE_TYPE)); -#endif /* !RT_VARLEN_VALUE_SIZE */ #endif /* RT_SHMEM */ /* add root node now so that RT_SET can assume it exists */ diff --git a/src/test/modules/test_radixtree/test_radixtree.c b/src/test/modules/test_radixtree/test_radixtree.c index 8074b83695..5c54962edf 100644 --- a/src/test/modules/test_radixtree/test_radixtree.c +++ b/src/test/modules/test_radixtree/test_radixtree.c @@ -313,9 +313,10 @@ test_random(void) #else MemoryContext radixtree_ctx; - radixtree_ctx = AllocSetContextCreate(CurrentMemoryContext, + radixtree_ctx = SlabContextCreate(CurrentMemoryContext, "test_radix_tree", - ALLOCSET_SMALL_SIZES); + SLAB_DEFAULT_BLOCK_SIZE, + sizeof(TestValueType)); radixtree = rt_create(radixtree_ctx); #endif -- 2.47.1
From 1a0bc58c91eaf007db385a12798e50afc514f7c9 Mon Sep 17 00:00:00 2001 From: John Naylor <john.nay...@postgresql.org> Date: Fri, 20 Dec 2024 13:04:18 +0700 Subject: [PATCH v3 2/3] Get rid of radix tree's general purpose context Previously, this was only used for the root of the tree and, for local memory, the control object. These can just as well be allocated in the caller's current context. This makes the memory context parameter from RT_CREATE for shared memory unused, so remove it adjust callers to match. --- src/backend/access/common/tidstore.c | 15 +++---- src/include/lib/radixtree.h | 35 ++++++--------- .../modules/test_radixtree/test_radixtree.c | 45 ++++++++----------- 3 files changed, 37 insertions(+), 58 deletions(-) diff --git a/src/backend/access/common/tidstore.c b/src/backend/access/common/tidstore.c index a7179759d6..786ed66b47 100644 --- a/src/backend/access/common/tidstore.c +++ b/src/backend/access/common/tidstore.c @@ -116,7 +116,7 @@ struct TidStore /* MemoryContext where the TidStore is allocated */ MemoryContext context; - /* MemoryContext that the radix tree uses */ + /* MemoryContext for the radix tree when using local memory, NULL for shared memory */ MemoryContext rt_context; /* Storage for TIDs. Use either one depending on TidStoreIsShared() */ @@ -217,10 +217,6 @@ TidStoreCreateShared(size_t max_bytes, int tranche_id) ts = palloc0(sizeof(TidStore)); ts->context = CurrentMemoryContext; - ts->rt_context = AllocSetContextCreate(CurrentMemoryContext, - "TID storage meta data", - ALLOCSET_SMALL_SIZES); - /* * Choose the initial and maximum DSA segment sizes to be no longer than * 1/8 of max_bytes. @@ -235,8 +231,7 @@ TidStoreCreateShared(size_t max_bytes, int tranche_id) dsa_init_size = dsa_max_size; area = dsa_create_ext(tranche_id, dsa_init_size, dsa_max_size); - ts->tree.shared = shared_ts_create(ts->rt_context, area, - tranche_id); + ts->tree.shared = shared_ts_create(area, tranche_id); ts->area = area; return ts; @@ -328,13 +323,13 @@ TidStoreDestroy(TidStore *ts) if (TidStoreIsShared(ts)) { shared_ts_free(ts->tree.shared); - dsa_detach(ts->area); } else + { local_ts_free(ts->tree.local); - - MemoryContextDelete(ts->rt_context); + MemoryContextDelete(ts->rt_context); + } pfree(ts); } diff --git a/src/include/lib/radixtree.h b/src/include/lib/radixtree.h index 8fe2302652..32f312305e 100644 --- a/src/include/lib/radixtree.h +++ b/src/include/lib/radixtree.h @@ -275,7 +275,7 @@ typedef dsa_pointer RT_HANDLE; #endif #ifdef RT_SHMEM -RT_SCOPE RT_RADIX_TREE *RT_CREATE(MemoryContext ctx, dsa_area *dsa, int tranche_id); +RT_SCOPE RT_RADIX_TREE *RT_CREATE(dsa_area *dsa, int tranche_id); RT_SCOPE RT_RADIX_TREE *RT_ATTACH(dsa_area *dsa, dsa_pointer dp); RT_SCOPE void RT_DETACH(RT_RADIX_TREE * tree); RT_SCOPE RT_HANDLE RT_GET_HANDLE(RT_RADIX_TREE * tree); @@ -706,8 +706,6 @@ typedef struct RT_RADIX_TREE_CONTROL /* Entry point for allocating and accessing the tree */ struct RT_RADIX_TREE { - MemoryContext context; - /* pointing to either local memory or DSA */ RT_RADIX_TREE_CONTROL *ctl; @@ -1809,31 +1807,25 @@ have_slot: /***************** SETUP / TEARDOWN *****************/ /* - * Create the radix tree in the given memory context and return it. + * Create the radix tree root in the caller's current memory context and return it. * - * All local memory required for a radix tree is allocated in the given - * memory context and its children. Note that RT_FREE() will delete all - * allocated space within the given memory context, so the dsa_area should - * be created in a different context. + * The tree's nodes and leaves are allocated in "ctx" and its children for + * local memory, or in "dsa" for shared memory. */ RT_SCOPE RT_RADIX_TREE * #ifdef RT_SHMEM -RT_CREATE(MemoryContext ctx, dsa_area *dsa, int tranche_id) +RT_CREATE(dsa_area *dsa, int tranche_id) #else RT_CREATE(MemoryContext ctx) #endif { RT_RADIX_TREE *tree; - MemoryContext old_ctx; RT_CHILD_PTR rootnode; #ifdef RT_SHMEM dsa_pointer dp; #endif - old_ctx = MemoryContextSwitchTo(ctx); - tree = (RT_RADIX_TREE *) palloc0(sizeof(RT_RADIX_TREE)); - tree->context = ctx; #ifdef RT_SHMEM tree->dsa = dsa; @@ -1858,7 +1850,7 @@ RT_CREATE(MemoryContext ctx) } /* By default we use the passed context for leaves. */ - tree->leaf_context = tree->context; + tree->leaf_context = ctx; #ifndef RT_VARLEN_VALUE_SIZE @@ -1880,8 +1872,6 @@ RT_CREATE(MemoryContext ctx) tree->ctl->start_shift = 0; tree->ctl->max_val = RT_SHIFT_GET_MAX_VAL(0); - MemoryContextSwitchTo(old_ctx); - return tree; } @@ -2054,13 +2044,16 @@ RT_FREE(RT_RADIX_TREE * tree) */ tree->ctl->magic = 0; dsa_free(tree->dsa, tree->ctl->handle); -#endif - +#else /* - * Free all space allocated within the tree's context and delete all child + * Free all space allocated within the leaf context and delete all child * contexts such as those used for nodes. */ - MemoryContextReset(tree->context); + MemoryContextReset(tree->leaf_context); + + pfree(tree->ctl); +#endif + pfree(tree); } /***************** ITERATION *****************/ @@ -2674,7 +2667,7 @@ RT_MEMORY_USAGE(RT_RADIX_TREE * tree) Assert(tree->ctl->magic == RT_RADIX_TREE_MAGIC); total = dsa_get_total_size(tree->dsa); #else - total = MemoryContextMemAllocated(tree->context, true); + total = MemoryContextMemAllocated(tree->leaf_context, true); #endif return total; diff --git a/src/test/modules/test_radixtree/test_radixtree.c b/src/test/modules/test_radixtree/test_radixtree.c index 3e5aa3720c..8074b83695 100644 --- a/src/test/modules/test_radixtree/test_radixtree.c +++ b/src/test/modules/test_radixtree/test_radixtree.c @@ -120,25 +120,22 @@ PG_FUNCTION_INFO_V1(test_radixtree); static void test_empty(void) { - MemoryContext radixtree_ctx; rt_radix_tree *radixtree; rt_iter *iter; uint64 key; #ifdef TEST_SHARED_RT int tranche_id = LWLockNewTrancheId(); dsa_area *dsa; -#endif - - radixtree_ctx = AllocSetContextCreate(CurrentMemoryContext, - "test_radix_tree", - ALLOCSET_SMALL_SIZES); -#ifdef TEST_SHARED_RT LWLockRegisterTranche(tranche_id, "test_radix_tree"); dsa = dsa_create(tranche_id); - - radixtree = rt_create(radixtree_ctx, dsa, tranche_id); + radixtree = rt_create(dsa, tranche_id); #else + MemoryContext radixtree_ctx; + + radixtree_ctx = AllocSetContextCreate(CurrentMemoryContext, + "test_radix_tree", + ALLOCSET_SMALL_SIZES); radixtree = rt_create(radixtree_ctx); #endif @@ -165,7 +162,6 @@ test_empty(void) static void test_basic(rt_node_class_test_elem *test_info, int shift, bool asc) { - MemoryContext radixtree_ctx; rt_radix_tree *radixtree; rt_iter *iter; uint64 *keys; @@ -173,18 +169,16 @@ test_basic(rt_node_class_test_elem *test_info, int shift, bool asc) #ifdef TEST_SHARED_RT int tranche_id = LWLockNewTrancheId(); dsa_area *dsa; -#endif - radixtree_ctx = AllocSetContextCreate(CurrentMemoryContext, - "test_radix_tree", - ALLOCSET_SMALL_SIZES); - -#ifdef TEST_SHARED_RT LWLockRegisterTranche(tranche_id, "test_radix_tree"); dsa = dsa_create(tranche_id); - - radixtree = rt_create(radixtree_ctx, dsa, tranche_id); + radixtree = rt_create(dsa, tranche_id); #else + MemoryContext radixtree_ctx; + + radixtree_ctx = AllocSetContextCreate(CurrentMemoryContext, + "test_radix_tree", + ALLOCSET_SMALL_SIZES); radixtree = rt_create(radixtree_ctx); #endif @@ -300,7 +294,6 @@ key_cmp(const void *a, const void *b) static void test_random(void) { - MemoryContext radixtree_ctx; rt_radix_tree *radixtree; rt_iter *iter; pg_prng_state state; @@ -313,18 +306,16 @@ test_random(void) #ifdef TEST_SHARED_RT int tranche_id = LWLockNewTrancheId(); dsa_area *dsa; -#endif - radixtree_ctx = AllocSetContextCreate(CurrentMemoryContext, - "test_radix_tree", - ALLOCSET_SMALL_SIZES); - -#ifdef TEST_SHARED_RT LWLockRegisterTranche(tranche_id, "test_radix_tree"); dsa = dsa_create(tranche_id); - - radixtree = rt_create(radixtree_ctx, dsa, tranche_id); + radixtree = rt_create(dsa, tranche_id); #else + MemoryContext radixtree_ctx; + + radixtree_ctx = AllocSetContextCreate(CurrentMemoryContext, + "test_radix_tree", + ALLOCSET_SMALL_SIZES); radixtree = rt_create(radixtree_ctx); #endif -- 2.47.1