On 10/21/25 09:10, Heikki Linnakangas wrote:
> On 18/10/2025 01:49, Tomas Vondra wrote:
>> On 10/17/25 12:32, Tomas Vondra wrote:
>>>
>>>
>>> On 10/17/25 10:31, Heikki Linnakangas wrote:
>>>>> typedef struct ResourceElem
>>>>> {
>>>>> Datum item;
>>>>> + uint32 count; /* number of occurrences */
>>>>> const ResourceOwnerDesc *kind; /* NULL indicates a free hash
>>>>> table slot */
>>>>> } ResourceElem;
>>>>
>>>> Hmm, the 'count' is not used when the entry is stored in the array.
>>>> Perhaps we should have a separate struct for array and hash elements
>>>> now. Keeping the array small helps it to fit in CPU caches.
>>>
>>> Agreed. I had the same idea yesterday, but I haven't done it yet.
>>
>> The attached v2 does that - it adds a separate ResourceHashElem for the
>> has table, and it works. But I'm not sure I like it very much, because
>> there are two places that relied on both the array and hash table using
>> the same struct to "walk" it the same way.
>>
>> For ResourceOwnerSort() it's not too bad, but ResourceOwnerReleaseAll()
>> now duplicates most of the code. It's not terrible, but also not pretty.
>> I can't think of a better way, though.
>
> Looks fine to me. The code duplication is not too bad IMO.
>
> Here's a rebased version of the micro-benchmark I used when I was
> working on the ResourceOwner refactoring (https://www.postgresql.org/
> message-id/d746cead-a1ef-7efe-fb47-933311e876a3%40iki.fi).
>
> I ran it again on my laptop. Different from the one I used back then, so
> the results are not comparable with the results from that old thread.
>
> Unpatched (commit 18d26140934):
>
> postgres=# \i contrib/resownerbench/snaptest.sql
> numkeep | numsnaps | lifo_time_ns | fifo_time_ns
> ---------+----------+--------------+--------------
> 0 | 1 | 11.6 | 11.1
> 0 | 5 | 12.1 | 13.1
> 0 | 10 | 12.3 | 13.5
> 0 | 60 | 14.6 | 19.4
> 0 | 70 | 16.0 | 18.1
> 0 | 100 | 16.7 | 18.0
> 0 | 1000 | 18.1 | 20.7
> 0 | 10000 | 21.9 | 29.5
> 9 | 10 | 11.0 | 11.1
> 9 | 100 | 14.9 | 20.0
> 9 | 1000 | 16.1 | 24.4
> 9 | 10000 | 21.9 | 25.7
> 65 | 70 | 11.7 | 12.5
> 65 | 100 | 13.9 | 14.8
> 65 | 1000 | 16.7 | 17.8
> 65 | 10000 | 22.5 | 27.8
> (16 rows)
>
> v2-0001-Deduplicate-entries-in-ResourceOwner.patch:
>
> postgres=# \i contrib/resownerbench/snaptest.sql
> numkeep | numsnaps | lifo_time_ns | fifo_time_ns
> ---------+----------+--------------+--------------
> 0 | 1 | 10.8 | 10.6
> 0 | 5 | 11.5 | 12.3
> 0 | 10 | 12.1 | 13.0
> 0 | 60 | 13.9 | 19.4
> 0 | 70 | 15.9 | 18.7
> 0 | 100 | 16.0 | 18.5
> 0 | 1000 | 19.2 | 21.6
> 0 | 10000 | 22.4 | 29.0
> 9 | 10 | 11.2 | 11.3
> 9 | 100 | 14.4 | 19.9
> 9 | 1000 | 16.4 | 23.8
> 9 | 10000 | 22.4 | 25.7
> 65 | 70 | 11.4 | 12.1
> 65 | 100 | 14.8 | 17.0
> 65 | 1000 | 16.9 | 18.1
> 65 | 10000 | 22.5 | 28.5
> (16 rows)
>
> v20251016-0001-Deduplicate-entries-in-ResourceOwner.patch:
>
> postgres=# \i contrib/resownerbench/snaptest.sql
> numkeep | numsnaps | lifo_time_ns | fifo_time_ns
> ---------+----------+--------------+--------------
> 0 | 1 | 11.3 | 11.1
> 0 | 5 | 12.3 | 13.0
> 0 | 10 | 13.0 | 14.1
> 0 | 60 | 14.7 | 20.5
> 0 | 70 | 16.3 | 19.0
> 0 | 100 | 16.5 | 18.4
> 0 | 1000 | 19.0 | 22.4
> 0 | 10000 | 23.2 | 29.6
> 9 | 10 | 11.2 | 11.1
> 9 | 100 | 14.8 | 20.5
> 9 | 1000 | 16.8 | 24.3
> 9 | 10000 | 23.3 | 26.5
> 65 | 70 | 12.4 | 13.0
> 65 | 100 | 15.2 | 16.6
> 65 | 1000 | 16.9 | 18.4
> 65 | 10000 | 23.4 | 29.3
> (16 rows)
>
> These are just a single run on my laptop, the error bars on individual
> numbers are significant. But it seems to me that V2 is maybe a little
> faster when the entries fit in the array.
>
Thanks. Attached is a version that adds a missing .sql file defining the
benchmark functions, and then two patches. The v2 is the patch already
shared on Saturday, v3 is a minor tweak (details in a minute).
I ran the benchmark on my test machine with both v1 and v2, with 10
runs. And there seems to be a small regression of ~5-10% (compared to
master). Which is not terrible, but also not negligible. v0 is master.
Here's the "fifo" results:
keep snaps | v0 v1 v2 v3 | v1 v2 v3
==================================================================
0 1 | 10.57 10.70 10.73 10.5 | 101% 102% 99%
5 | 10.52 10.70 10.90 10.5 | 102% 104% 100%
10 | 11.41 11.90 11.94 11.5 | 104% 105% 101%
60 | 15.04 15.74 15.82 15.5 | 105% 105% 103%
70 | 13.73 14.42 14.80 14.3 | 105% 108% 104%
100 | 13.65 14.27 14.59 14.1 | 105% 107% 104%
1000 | 15.07 15.78 16.27 15.6 | 105% 108% 104%
10000 | 23.15 24.96 25.57 24.1 | 108% 110% 104%
------------------------------------------------------------------
9 10 | 10.8 10.94 10.86 10.6 | 101% 101% 98%
100 | 15.83 16.35 16.72 16.1 | 103% 106% 102%
1000 | 19.04 19.70 20.34 19.6 | 103% 107% 103%
10000 | 21.5 23.37 24.18 22.5 | 109% 112% 105%
------------------------------------------------------------------
65 70 | 10.58 10.95 11.18 10.6 | 103% 106% 100%
100 | 13.29 14.28 14.73 14.1 | 107% 111% 106%
1000 | 14.62 15.49 15.99 15.2 | 106% 109% 104%
10000 | 22.98 24.78 25.55 24.3 | 108% 111% 106%
and here's "lifo"
keep snaps | v0 v1 v2 v3 | v1 v2 v3
==================================================================
0 1 | 10.73 10.74 11.06 10.4 | 100% 103% 97%
5 | 10.44 10.45 10.62 10.2 | 100% 102% 98%
10 | 11.82 11.84 12.17 11.6 | 100% 103% 98%
60 | 12.15 12.81 12.94 12.4 | 105% 107% 102%
70 | 12.84 13.61 13.95 13.4 | 106% 109% 104%
100 | 12.98 13.73 14.06 13.5 | 106% 108% 104%
1000 | 13.71 14.56 14.80 14.2 | 106% 108% 104%
10000 | 17.68 19.51 20.38 19.0 | 110% 115% 108%
------------------------------------------------------------------
9 10 | 10.96 10.92 11.02 10.6 | 100% 101% 97%
100 | 12.58 13.11 13.26 12.8 | 104% 105% 102%
1000 | 13.67 14.38 14.73 14.1 | 105% 108% 103%
10000 | 17.67 19.79 20.20 19.2 | 112% 114% 109%
------------------------------------------------------------------
65 70 | 10.58 10.85 10.78 11.1 | 103% 102% 105%
100 | 13.03 14.12 14.27 13.5 | 108% 110% 103%
1000 | 13.65 14.56 14.88 14.1 | 107% 109% 104%
10000 | 17.62 19.61 20.30 19.0 | 111% 115% 108%
I was wondering where the regression comes from, and it occurred to me
ResourceOwnerAddToHash() may be doing the checks in the wrong order. It
should be checking for empty slot first, so I did that - that's v3.
There's still a regression, but it's about halved compared to v2, and
about equal to v1. So I tried doing the same tweak for v1, but that
didn't help much (if at all).
The results seem fairly stable, and the overall trend is clear. It'd be
great if there were no regressions, but considering how narrow is this
microbenchmark (and considering the benefits for practical COPY runs),
I'd say it's probably OK.
regards
--
Tomas Vondra
From 670fd60c6c953286956a3a2d52cf0b532deaafbb Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <[email protected]>
Date: Tue, 21 Oct 2025 09:51:23 +0300
Subject: [PATCH 1/3] resownerbench
---
contrib/Makefile | 1 +
contrib/meson.build | 2 +
contrib/resownerbench/Makefile | 17 ++
contrib/resownerbench/meson.build | 24 +++
contrib/resownerbench/resownerbench--1.0.sql | 7 +
contrib/resownerbench/resownerbench.c | 154 +++++++++++++++++++
contrib/resownerbench/resownerbench.control | 6 +
contrib/resownerbench/snaptest.sql | 41 +++++
8 files changed, 252 insertions(+)
create mode 100644 contrib/resownerbench/Makefile
create mode 100644 contrib/resownerbench/meson.build
create mode 100644 contrib/resownerbench/resownerbench--1.0.sql
create mode 100644 contrib/resownerbench/resownerbench.c
create mode 100644 contrib/resownerbench/resownerbench.control
create mode 100644 contrib/resownerbench/snaptest.sql
diff --git a/contrib/Makefile b/contrib/Makefile
index 2f0a88d3f77..f84b26d52cf 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -43,6 +43,7 @@ SUBDIRS = \
pg_visibility \
pg_walinspect \
postgres_fdw \
+ resownerbench \
seg \
spi \
tablefunc \
diff --git a/contrib/meson.build b/contrib/meson.build
index ed30ee7d639..9c34bb128b7 100644
--- a/contrib/meson.build
+++ b/contrib/meson.build
@@ -71,3 +71,5 @@ subdir('unaccent')
subdir('uuid-ossp')
subdir('vacuumlo')
subdir('xml2')
+
+subdir('resownerbench')
diff --git a/contrib/resownerbench/Makefile b/contrib/resownerbench/Makefile
new file mode 100644
index 00000000000..9b0e1cfee1a
--- /dev/null
+++ b/contrib/resownerbench/Makefile
@@ -0,0 +1,17 @@
+MODULE_big = resownerbench
+OBJS = resownerbench.o
+
+EXTENSION = resownerbench
+DATA = resownerbench--1.0.sql
+PGFILEDESC = "resownerbench - benchmark for ResourceOwners"
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/resownerbench
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/resownerbench/meson.build b/contrib/resownerbench/meson.build
new file mode 100644
index 00000000000..30051c3a375
--- /dev/null
+++ b/contrib/resownerbench/meson.build
@@ -0,0 +1,24 @@
+# Copyright (c) 2022-2025, PostgreSQL Global Development Group
+
+resownerbench_sources = files(
+ 'resownerbench.c',
+)
+
+resownerbench = shared_module('resownerbench',
+ resownerbench_sources,
+ c_pch: pch_postgres_h,
+ kwargs: contrib_mod_args,
+)
+contrib_targets += resownerbench
+
+install_data(
+ 'resownerbench.control',
+ 'resownerbench--1.0.sql',
+ kwargs: contrib_data_args,
+)
+
+tests += {
+ 'name': 'resownerbench',
+ 'sd': meson.current_source_dir(),
+ 'bd': meson.current_build_dir(),
+}
diff --git a/contrib/resownerbench/resownerbench--1.0.sql b/contrib/resownerbench/resownerbench--1.0.sql
new file mode 100644
index 00000000000..8521caed2d8
--- /dev/null
+++ b/contrib/resownerbench/resownerbench--1.0.sql
@@ -0,0 +1,7 @@
+CREATE OR REPLACE FUNCTION snapshotbench_fifo(numkeep int, numsnaps int, numiters int) RETURNS double precision
+AS 'resownerbench', 'snapshotbench_fifo'
+LANGUAGE C;
+
+CREATE OR REPLACE FUNCTION snapshotbench_lifo(numkeep int, numsnaps int, numiters int) RETURNS double precision
+AS 'resownerbench', 'snapshotbench_lifo'
+LANGUAGE C;
diff --git a/contrib/resownerbench/resownerbench.c b/contrib/resownerbench/resownerbench.c
new file mode 100644
index 00000000000..80f98c2105b
--- /dev/null
+++ b/contrib/resownerbench/resownerbench.c
@@ -0,0 +1,154 @@
+#include "postgres.h"
+
+#include "catalog/pg_type.h"
+#include "catalog/pg_statistic.h"
+#include "executor/spi.h"
+#include "funcapi.h"
+#include "libpq/pqsignal.h"
+#include "utils/catcache.h"
+#include "utils/syscache.h"
+#include "utils/timestamp.h"
+#include "utils/snapmgr.h"
+
+PG_MODULE_MAGIC;
+
+PG_FUNCTION_INFO_V1(snapshotbench_lifo);
+PG_FUNCTION_INFO_V1(snapshotbench_fifo);
+
+/*
+ * ResourceOwner Performance test, using RegisterSnapshot().
+ *
+ * This takes three parameters: numkeep, numsnaps, numiters.
+ *
+ * First, we register 'numkeep' snapshots. They are kept registed
+ * until the end of the test. Then, we repeatedly register and
+ * unregister 'numsnaps - numkeep' additional snapshots, repeating
+ * 'numiters' times. All the register/unregister calls are made in
+ * LIFO order.
+ *
+ * Returns the time spent, in milliseconds.
+ *
+ * The idea is to test the performance of ResourceOwnerRemember()
+ * and ReourceOwnerForget() operations, under different regimes.
+ *
+ * In the old implementation, if 'numsnaps' is small enough, all
+ * the entries fit in the resource owner's small array (it can
+ * hold 64 entries).
+ *
+ * In the new implementation, the array is much smaller, only 8
+ * entries, but it's used together with the hash table so that
+ * we stay in the "array regime" as long as 'numsnaps - numkeep'
+ * is smaller than 8 entries.
+ *
+ * 'numiters' can be adjusted to adjust the overall runtime to be
+ * suitable long.
+ */
+Datum
+snapshotbench_lifo(PG_FUNCTION_ARGS)
+{
+ int numkeep = PG_GETARG_INT32(0);
+ int numsnaps = PG_GETARG_INT32(1);
+ int numiters = PG_GETARG_INT32(2);
+ int i;
+ instr_time start,
+ duration;
+ Snapshot lsnap;
+ Snapshot *rs;
+ int numregistered = 0;
+
+ rs = palloc(Max(numsnaps, numkeep) * sizeof(Snapshot));
+
+ lsnap = GetLatestSnapshot();
+
+ sigprocmask(SIG_SETMASK, &BlockSig, NULL);
+ INSTR_TIME_SET_CURRENT(start);
+
+ while (numregistered < numkeep)
+ {
+ rs[numregistered] = RegisterSnapshot(lsnap);
+ numregistered++;
+ }
+
+ for (i = 0 ; i < numiters; i++)
+ {
+ while (numregistered < numsnaps)
+ {
+ rs[numregistered] = RegisterSnapshot(lsnap);
+ numregistered++;
+ }
+
+ while (numregistered > numkeep)
+ {
+ numregistered--;
+ UnregisterSnapshot(rs[numregistered]);
+ }
+ }
+
+ while (numregistered > 0)
+ {
+ numregistered--;
+ UnregisterSnapshot(rs[numregistered]);
+ }
+
+ INSTR_TIME_SET_CURRENT(duration);
+ INSTR_TIME_SUBTRACT(duration, start);
+ sigprocmask(SIG_SETMASK, &UnBlockSig, NULL);
+
+ PG_RETURN_FLOAT8(INSTR_TIME_GET_MILLISEC(duration));
+};
+
+
+/*
+ * Same, but do the register/unregister operations in
+ * FIFO order.
+ */
+Datum
+snapshotbench_fifo(PG_FUNCTION_ARGS)
+{
+ int numkeep = PG_GETARG_INT32(0);
+ int numsnaps = PG_GETARG_INT32(1);
+ int numiters = PG_GETARG_INT32(2);
+ int i,
+ j;
+ instr_time start,
+ duration;
+ Snapshot lsnap;
+ Snapshot *rs;
+ int numregistered = 0;
+
+ rs = palloc(Max(numsnaps, numkeep) * sizeof(Snapshot));
+
+ lsnap = GetLatestSnapshot();
+
+ sigprocmask(SIG_SETMASK, &BlockSig, NULL);
+ INSTR_TIME_SET_CURRENT(start);
+
+ while (numregistered < numkeep)
+ {
+ rs[numregistered] = RegisterSnapshot(lsnap);
+ numregistered++;
+ }
+
+ for (i = 0 ; i < numiters; i++)
+ {
+ while (numregistered < numsnaps)
+ {
+ rs[numregistered] = RegisterSnapshot(lsnap);
+ numregistered++;
+ }
+
+ for (j = numkeep; j < numregistered; j++)
+ UnregisterSnapshot(rs[j]);
+ numregistered = numkeep;
+ }
+
+ for (j = 0; j < numregistered; j++)
+ UnregisterSnapshot(rs[j]);
+ numregistered = numkeep;
+
+ INSTR_TIME_SET_CURRENT(duration);
+ INSTR_TIME_SUBTRACT(duration, start);
+ sigprocmask(SIG_SETMASK, &UnBlockSig, NULL);
+
+ PG_RETURN_FLOAT8(INSTR_TIME_GET_MILLISEC(duration));
+};
diff --git a/contrib/resownerbench/resownerbench.control b/contrib/resownerbench/resownerbench.control
new file mode 100644
index 00000000000..ada88b8eed8
--- /dev/null
+++ b/contrib/resownerbench/resownerbench.control
@@ -0,0 +1,6 @@
+# resownerbench
+
+comment = 'benchmark for ResourceOwners'
+default_version = '1.0'
+module_pathname = '$libdir/resownerbench'
+relocatable = true
diff --git a/contrib/resownerbench/snaptest.sql b/contrib/resownerbench/snaptest.sql
new file mode 100644
index 00000000000..a8fd4da2f52
--- /dev/null
+++ b/contrib/resownerbench/snaptest.sql
@@ -0,0 +1,41 @@
+--
+-- Performance test RegisterSnapshot/UnregisterSnapshot.
+--
+with params as materialized (
+select * from
+generate_series(1,10) s(run),
+(values (0, 1, 10000000 * 10),
+ (0, 5, 2000000 * 10),
+ (0, 10, 1000000 * 10),
+ (0, 60, 100000 * 10),
+ (0, 70, 100000 * 10),
+ (0, 100, 100000 * 10),
+ (0, 1000, 10000 * 10),
+ (0, 10000, 1000 * 10),
+
+-- These tests keep 9 snapshots registered across the iterations. That
+-- exceeds the size of the little array in the patch, so this exercises
+-- the hash lookups. Without the patch, these still fit in the array
+-- (it's 64 entries without the patch)
+ (9, 10, 10000000 * 10),
+ (9, 100, 100000 * 10),
+ (9, 1000, 10000 * 10),
+ (9, 10000, 1000 * 10),
+
+-- These exceed the 64 entry array even without the patch, so these fall
+-- in the hash table regime with and without the patch.
+ (65, 70, 1000000 * 10),
+ (65, 100, 100000 * 10),
+ (65, 1000, 10000 * 10),
+ (65, 10000, 1000 * 10)
+) AS params (numkeep, numsnaps, numiters)
+)
+select run, numkeep, numsnaps,
+ -- numiters,
+ -- round(lifo_time_ms) as lifo_total_time_ms,
+ -- round(fifo_time_ms) as fifo_total_time_ms,
+ round((lifo_time_ms::numeric / (numkeep + (numsnaps - numkeep) * numiters)) * 1000000, 1) as lifo_time_ns,
+ round((fifo_time_ms::numeric / (numkeep + (numsnaps - numkeep) * numiters)) * 1000000, 1) as fifo_time_ns
+from params,
+lateral snapshotbench_lifo(numkeep, numsnaps, numiters) as lifo_time_ms,
+lateral snapshotbench_fifo(numkeep, numsnaps, numiters) as fifo_time_ms;
--
2.51.0
From 83ff6c494312ea778172c446c3c4ca042dbb3d67 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Thu, 16 Oct 2025 13:27:25 +0200
Subject: [PATCH 2/3] v2: Deduplicate entries in ResourceOwner
We may track the same resource many times, but the current ResourceOwner
does not handle that efficiently. Each reference is a separate entry,
which leads to long linear searches in the simple hash table.
This adds a simple opportunistic deduplication. Entries get a counter,
tracking how many references they represent. It only applies to the
"hash" phase, not to the initial array. And when adding an entry to the
hash finds an empty slot first, it uses it (instead of continuing to
look for a possible match).
This means the deduplication is not perfect - there may be multiple
entries for the same value. But in practice it seems like a good trade
off. It won't affect cases with no duplicates, and it will help cases
with many of them.
The code also continues to grow the hash table as before. In some cases
it might be enough to deduplicate the contents. But it's hard to say in
advance, and it's not much cheaper as it still requires walking the
current hash table. So there's a risk of attempting deduplication, only
to find the hash still needs to be enlarged, doubling the cost. So we
just grow the hash table. In the worst case the memory usage is not
worse than without deduplication. If deduplication is effective, the
hash table grows later or not at all.
---
src/backend/utils/resowner/resowner.c | 215 ++++++++++++++++++++------
1 file changed, 166 insertions(+), 49 deletions(-)
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index fca84ded6dd..73de45baccd 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -15,6 +15,12 @@
* a particular reference, you need to search both the array and the hash
* table.
*
+ * The hash table deduplicates entries in an opportunistic way. If it finds
+ * an entry for the same resource (same kind), it increments a counter instead.
+ * But if it finds an empty slot first, it uses the slot. If the table gets
+ * enlarged, those entries will be merged, but it's possible to have multiple
+ * entries for the same resource. The fixed-size array does not deduplicate.
+ *
* The most frequent usage is that a resource is remembered, and forgotten
* shortly thereafter. For example, pin a buffer, read one tuple from it,
* release the pin. Linearly scanning the small array handles that case
@@ -67,6 +73,18 @@ typedef struct ResourceElem
const ResourceOwnerDesc *kind; /* NULL indicates a free hash table slot */
} ResourceElem;
+/*
+ * ResourceHashElem represents a reference associated with a resource owner,
+ * stored in the hash table. Similar to ResourceElem, but has a counter field
+ * tracking how many references it represents, to allow deduplication.
+ */
+typedef struct ResourceHashElem
+{
+ Datum item;
+ uint32 count; /* number of references */
+ const ResourceOwnerDesc *kind; /* NULL indicates a free hash table slot */
+} ResourceHashElem;
+
/*
* Size of the fixed-size array to hold most-recently remembered resources.
*/
@@ -151,7 +169,7 @@ struct ResourceOwnerData
* release priority. The first 'nhash' elements are occupied, the rest
* are empty.
*/
- ResourceElem *hash;
+ ResourceHashElem *hash;
uint32 capacity; /* allocated length of hash[] */
uint32 grow_at; /* grow hash when reach this */
@@ -198,7 +216,8 @@ static ResourceReleaseCallbackItem *ResourceRelease_callbacks = NULL;
/* Internal routines */
static inline uint32 hash_resource_elem(Datum value, const ResourceOwnerDesc *kind);
static void ResourceOwnerAddToHash(ResourceOwner owner, Datum value,
- const ResourceOwnerDesc *kind);
+ const ResourceOwnerDesc *kind,
+ uint32 count);
static int resource_priority_cmp(const void *a, const void *b);
static void ResourceOwnerSort(ResourceOwner owner);
static void ResourceOwnerReleaseAll(ResourceOwner owner,
@@ -239,7 +258,8 @@ hash_resource_elem(Datum value, const ResourceOwnerDesc *kind)
* Adds 'value' of given 'kind' to the ResourceOwner's hash table
*/
static void
-ResourceOwnerAddToHash(ResourceOwner owner, Datum value, const ResourceOwnerDesc *kind)
+ResourceOwnerAddToHash(ResourceOwner owner, Datum value, const ResourceOwnerDesc *kind,
+ uint32 count)
{
uint32 mask = owner->capacity - 1;
uint32 idx;
@@ -250,12 +270,25 @@ ResourceOwnerAddToHash(ResourceOwner owner, Datum value, const ResourceOwnerDesc
idx = hash_resource_elem(value, kind) & mask;
for (;;)
{
+ /*
+ * Increment counter for a matching entry or use an empty slot,
+ * whichever we find first.
+ */
+
+ if ((owner->hash[idx].kind == kind) &&
+ (owner->hash[idx].item == value))
+ {
+ owner->hash[idx].count += count;
+ return;
+ }
+
if (owner->hash[idx].kind == NULL)
break; /* found a free slot */
idx = (idx + 1) & mask;
}
owner->hash[idx].item = value;
owner->hash[idx].kind = kind;
+ owner->hash[idx].count = count;
owner->nhash++;
}
@@ -277,6 +310,24 @@ resource_priority_cmp(const void *a, const void *b)
return 1;
}
+/*
+ * Comparison function to sort by release phase and priority
+ */
+static int
+resource_priority_hash_cmp(const void *a, const void *b)
+{
+ const ResourceHashElem *ra = (const ResourceHashElem *) a;
+ const ResourceHashElem *rb = (const ResourceHashElem *) b;
+
+ /* Note: reverse order */
+ if (ra->kind->release_phase == rb->kind->release_phase)
+ return pg_cmp_u32(rb->kind->release_priority, ra->kind->release_priority);
+ else if (ra->kind->release_phase > rb->kind->release_phase)
+ return -1;
+ else
+ return 1;
+}
+
/*
* Sort resources in reverse release priority.
*
@@ -288,16 +339,21 @@ resource_priority_cmp(const void *a, const void *b)
static void
ResourceOwnerSort(ResourceOwner owner)
{
- ResourceElem *items;
- uint32 nitems;
-
if (owner->nhash == 0)
{
+ ResourceElem *items;
+ uint32 nitems;
+
items = owner->arr;
nitems = owner->narr;
+
+ qsort(items, nitems, sizeof(ResourceElem), resource_priority_cmp);
}
else
{
+ ResourceHashElem *items;
+ uint32 nitems;
+
/*
* Compact the hash table, so that all the elements are in the
* beginning of the 'hash' array, with no empty elements.
@@ -324,7 +380,9 @@ ResourceOwnerSort(ResourceOwner owner)
Assert(dst + owner->narr <= owner->capacity);
for (int idx = 0; idx < owner->narr; idx++)
{
- owner->hash[dst] = owner->arr[idx];
+ owner->hash[dst].item = owner->arr[idx].item;
+ owner->hash[dst].kind = owner->arr[idx].kind;
+ owner->hash[dst].count = 1;
dst++;
}
Assert(dst == owner->nhash + owner->narr);
@@ -333,9 +391,9 @@ ResourceOwnerSort(ResourceOwner owner)
items = owner->hash;
nitems = owner->nhash;
- }
- qsort(items, nitems, sizeof(ResourceElem), resource_priority_cmp);
+ qsort(items, nitems, sizeof(ResourceHashElem), resource_priority_hash_cmp);
+ }
}
/*
@@ -345,9 +403,6 @@ static void
ResourceOwnerReleaseAll(ResourceOwner owner, ResourceReleasePhase phase,
bool printLeakWarnings)
{
- ResourceElem *items;
- uint32 nitems;
-
/*
* ResourceOwnerSort must've been called already. All the resources are
* either in the array or the hash.
@@ -356,49 +411,93 @@ ResourceOwnerReleaseAll(ResourceOwner owner, ResourceReleasePhase phase,
Assert(owner->sorted);
if (owner->nhash == 0)
{
+ ResourceElem *items;
+ uint32 nitems;
+
items = owner->arr;
nitems = owner->narr;
+
+ /*
+ * The resources are sorted in reverse priority order. Release them
+ * starting from the end, until we hit the end of the phase that we are
+ * releasing now. We will continue from there when called again for the
+ * next phase.
+ */
+ while (nitems > 0)
+ {
+ uint32 idx = nitems - 1;
+ Datum value = items[idx].item;
+ const ResourceOwnerDesc *kind = items[idx].kind;
+
+ if (kind->release_phase > phase)
+ break;
+ Assert(kind->release_phase == phase);
+
+ if (printLeakWarnings)
+ {
+ char *res_str;
+
+ res_str = kind->DebugPrint ?
+ kind->DebugPrint(value)
+ : psprintf("%s %p", kind->name, DatumGetPointer(value));
+ pfree(res_str);
+ }
+
+ kind->ReleaseResource(value);
+
+ nitems--;
+ }
+
+ owner->narr = nitems;
}
else
{
+ ResourceHashElem *items;
+ uint32 nitems;
+
Assert(owner->narr == 0);
items = owner->hash;
nitems = owner->nhash;
- }
- /*
- * The resources are sorted in reverse priority order. Release them
- * starting from the end, until we hit the end of the phase that we are
- * releasing now. We will continue from there when called again for the
- * next phase.
- */
- while (nitems > 0)
- {
- uint32 idx = nitems - 1;
- Datum value = items[idx].item;
- const ResourceOwnerDesc *kind = items[idx].kind;
+ /*
+ * The resources are sorted in reverse priority order. Release them
+ * starting from the end, until we hit the end of the phase that we are
+ * releasing now. We will continue from there when called again for the
+ * next phase.
+ */
+ while (nitems > 0)
+ {
+ uint32 idx = nitems - 1;
+ Datum value = items[idx].item;
+ const ResourceOwnerDesc *kind = items[idx].kind;
+ uint32 count = items[idx].count;
- if (kind->release_phase > phase)
- break;
- Assert(kind->release_phase == phase);
+ if (kind->release_phase > phase)
+ break;
+ Assert(kind->release_phase == phase);
- if (printLeakWarnings)
- {
- char *res_str;
+ if (printLeakWarnings)
+ {
+ char *res_str;
+
+ res_str = kind->DebugPrint ?
+ kind->DebugPrint(value)
+ : psprintf("%s %p", kind->name, DatumGetPointer(value));
+ pfree(res_str);
+ }
- res_str = kind->DebugPrint ?
- kind->DebugPrint(value)
- : psprintf("%s %p", kind->name, DatumGetPointer(value));
- elog(WARNING, "resource was not closed: %s", res_str);
- pfree(res_str);
+ /* invoke the release callback for each reference */
+ while (count > 0)
+ {
+ kind->ReleaseResource(value);
+ count--;
+ }
+
+ nitems--;
}
- kind->ReleaseResource(value);
- nitems--;
- }
- if (owner->nhash == 0)
- owner->narr = nitems;
- else
+
owner->nhash = nitems;
+ }
}
@@ -466,16 +565,16 @@ ResourceOwnerEnlarge(ResourceOwner owner)
uint32 i,
oldcap,
newcap;
- ResourceElem *oldhash;
- ResourceElem *newhash;
+ ResourceHashElem *oldhash;
+ ResourceHashElem *newhash;
oldhash = owner->hash;
oldcap = owner->capacity;
/* Double the capacity (it must stay a power of 2!) */
newcap = (oldcap > 0) ? oldcap * 2 : RESOWNER_HASH_INIT_SIZE;
- newhash = (ResourceElem *) MemoryContextAllocZero(TopMemoryContext,
- newcap * sizeof(ResourceElem));
+ newhash = (ResourceHashElem *) MemoryContextAllocZero(TopMemoryContext,
+ newcap * sizeof(ResourceHashElem));
/*
* We assume we can't fail below this point, so OK to scribble on the
@@ -496,7 +595,8 @@ ResourceOwnerEnlarge(ResourceOwner owner)
for (i = 0; i < oldcap; i++)
{
if (oldhash[i].kind != NULL)
- ResourceOwnerAddToHash(owner, oldhash[i].item, oldhash[i].kind);
+ ResourceOwnerAddToHash(owner, oldhash[i].item, oldhash[i].kind,
+ oldhash[i].count);
}
/* And release old hash table. */
@@ -506,7 +606,7 @@ ResourceOwnerEnlarge(ResourceOwner owner)
/* Move items from the array to the hash */
for (int i = 0; i < owner->narr; i++)
- ResourceOwnerAddToHash(owner, owner->arr[i].item, owner->arr[i].kind);
+ ResourceOwnerAddToHash(owner, owner->arr[i].item, owner->arr[i].kind, 1);
owner->narr = 0;
Assert(owner->nhash <= owner->grow_at);
@@ -555,7 +655,8 @@ ResourceOwnerRemember(ResourceOwner owner, Datum value, const ResourceOwnerDesc
* Note: Forgetting a resource does not guarantee that there is room to
* remember a new resource. One exception is when you forget the most
* recently remembered resource; that does make room for a new remember call.
- * Some code callers rely on that exception.
+ * Some code callers rely on that exception. The deduplication does not
+ * change this, as it only applies to the hash table.
*/
void
ResourceOwnerForget(ResourceOwner owner, Datum value, const ResourceOwnerDesc *kind)
@@ -597,8 +698,17 @@ ResourceOwnerForget(ResourceOwner owner, Datum value, const ResourceOwnerDesc *k
if (owner->hash[idx].item == value &&
owner->hash[idx].kind == kind)
{
+ /* decrement the count for entry for multiple references */
+ if (owner->hash[idx].count > 1)
+ {
+ owner->hash[idx].count--;
+ return;
+ }
+
+ /* remove the entry entirely */
owner->hash[idx].item = (Datum) 0;
owner->hash[idx].kind = NULL;
+ owner->hash[idx].count = 0;
owner->nhash--;
#ifdef RESOWNER_STATS
@@ -847,12 +957,19 @@ ResourceOwnerReleaseAllOfKind(ResourceOwner owner, const ResourceOwnerDesc *kind
if (owner->hash[i].kind == kind)
{
Datum value = owner->hash[i].item;
+ uint32 count = owner->hash[i].count;
owner->hash[i].item = (Datum) 0;
owner->hash[i].kind = NULL;
+ owner->hash[i].count = 0;
owner->nhash--;
- kind->ReleaseResource(value);
+ /* invoke the release callback once for each reference */
+ while (count > 0)
+ {
+ kind->ReleaseResource(value);
+ count--;
+ }
}
}
owner->releasing = false;
--
2.51.0
From 98efe8f0a3ba4474bf21af6e7e727d529bb89d45 Mon Sep 17 00:00:00 2001
From: tomas <tomas>
Date: Tue, 21 Oct 2025 15:06:01 +0200
Subject: [PATCH 3/3] v3: Check for empty slots first
---
src/backend/utils/resowner/resowner.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/src/backend/utils/resowner/resowner.c b/src/backend/utils/resowner/resowner.c
index 73de45baccd..1a04eec945a 100644
--- a/src/backend/utils/resowner/resowner.c
+++ b/src/backend/utils/resowner/resowner.c
@@ -270,11 +270,13 @@ ResourceOwnerAddToHash(ResourceOwner owner, Datum value, const ResourceOwnerDesc
idx = hash_resource_elem(value, kind) & mask;
for (;;)
{
+ if (owner->hash[idx].kind == NULL)
+ break; /* found a free slot */
+
/*
* Increment counter for a matching entry or use an empty slot,
* whichever we find first.
*/
-
if ((owner->hash[idx].kind == kind) &&
(owner->hash[idx].item == value))
{
@@ -282,8 +284,6 @@ ResourceOwnerAddToHash(ResourceOwner owner, Datum value, const ResourceOwnerDesc
return;
}
- if (owner->hash[idx].kind == NULL)
- break; /* found a free slot */
idx = (idx + 1) & mask;
}
owner->hash[idx].item = value;
--
2.51.0