On 6/3/26 16:35, Matthias van de Meent wrote:
On Wed, 3 Jun 2026 at 08:33, Maksim.Melnikov<[email protected]> wrote:
On 6/16/25 11:56, Потапов Александр wrote:
To be more precise I used constant number of threads (128 and 1024) to compare
with previous results. The quadratic dependency exists everywhere, see new
graph.
Q: Did you check that pgbench or the OS does not have
O(n_active_connections) or O(n_active_threads) overhead per worker
during thread creation or connection establishment, e.g. by varying
the number of threads used to manage these N clients? I wouldn't be
surprised if there are inefficiencies in e.g. the threading- or
synchronization model that cause O(N) per-thread overhead, or O(N^2)
overall when you have one thread per connection.
Hi, all!
I've investigated slightly different scenario then Alexander and I want share
my thoughts in this thread too.
I found that when we run pgbench scenarios sequantially, without postgres
restart between iterations, initial time degrades from launch to launch and
eventually it stabilizes at the worst values then first run(ICT_degradation.png
attached).
Scenario details:
[...]
4.Add to the postgresql.conf:
huge_pages = off #for the sake of test stability and reproducibility
I think this is the main culprit of the extreme slowdown -- without
huge pages, you're effectively guaranteed to get many minor page
faults, and with it the relevant TLB miss rates. With huge pages
enabled, the proc array should fit on one (or just a few) memory
pages.
We're not generally in the business for optimizing workloads that have
huge_pages=off.
Yes, I agree, huge_pages=off is not common setup now. My motivation was
that even if some configuration isn't commonly used, it does not mean
that it isn't interesting for someone else at all and, as a consequence,
it can be optimized without degradation for basic scenarios . Moreover,
huge_pages = try is the default value, so with huge_pages set to try,
the server will try to request huge pages, but fall back to the
huge_page=off if that fails. As I know on linux default value
for vm.nr_hugepages = 0, this means that by default, the os does not
use HugeTLB pages. Of course, DBA should setup this, but on practice
they can miss this. Anyway, if community isn't interested in such kinds
of optimizations, it is ok. It was interesting and educational
investigation for me, thanks for your help.
.....
as we can see, patched version fixes this. I made a series of measurements for
all versions and attached comparison chart(ICT_degradation_with_patch.png
attached). Also I add the table with results
Do you happen to have data with huge_pages enabled?
I hope it will be interesting and helpful.
Definitely interesting. I'm not so sure it's as effective on a
production configuration (with huge pages enabled), but I'm definitely
interested in seeing test results.
I've made comparative measurements for configurations with huge_pages =
on/off. Please, you can check results below.
Clients number *Huge-pages-off-with-patch*
Huge-pages-off-without-patch Huge-pages-on-with-patch
Huge-pages-on-without-patch
512 ~480 +- 3.5% ms ~490 +- 3% ms ~420 +- 3.5% ms
~420+-3.5% ms
1024 ~910 +- 1.3% ms ~990 +- 2% ms ~790 +- 1.7% ms
~800+-1.8% ms
2048 ~1810 +- 1.4% ms ~2230 +- 0.9% ms ~1540 +- 0.7% ms ~1530 +-
1.4% ms
4096 ~3690 +- 1.9% ms ~6060 +- 0.8% ms ~3070 +- 0.6% ms ~3070 +-
0.9% ms
8192 ~9900 +- 0.6% ms ~18530 +- 0.4% ms ~6220 +- 0.7% ms ~6230 +-
0.7% ms
Also comparison chart is attached.
As we can see the measurements prove patch efficiency for configuration
with huge_page=off(the same result as in previous message), but for
huge_pages=on I've got the same results for both versions, no
improvement and no degradation.
----
Some comments on the patch:
Patch with fixes was attached. Thanks for review.
Best regards,
Maksim Melnikov
From 5e8b84fcfe4ecccc1847ed8d70d428e1fc5bc59f Mon Sep 17 00:00:00 2001
From: Maksim Melnikov <[email protected]>
Date: Tue, 18 Nov 2025 17:20:09 +0300
Subject: [PATCH v2] This patch reduce connection init/close time.
ProcArrayRemove/ProcArrayAdd are expensive in terms of accessing pgxactoff
field in PGPROC, because its are placed on different pages and it the
reason of page_faults occurence. Now one of the use case with pgxactoff
is iteration with gaps over PGPROCs and read/write pgxactoff, PGPROC
allocate ~1KB and it quite enough to have page_faults in case of such
accessing.
So we placed all pgxactoff in the separate array pgxactoffs to have adjacent
pages for them all. Indexes in allProcs aligned with pgxactoffs array, so
the Nth element in pgxactoffs refer to Nth PGPROC.
Eventually it helps to avoid extra page_faults and reduce connection
init/close time.
---
src/backend/access/transam/varsup.c | 10 +++---
src/backend/commands/indexcmds.c | 2 +-
src/backend/commands/vacuum.c | 2 +-
src/backend/replication/logical/logical.c | 2 +-
src/backend/replication/slot.c | 2 +-
src/backend/replication/walsender.c | 2 +-
src/backend/storage/ipc/procarray.c | 40 +++++++++++++----------
src/backend/storage/lmgr/proc.c | 6 +++-
src/include/storage/proc.h | 6 ++--
9 files changed, 40 insertions(+), 32 deletions(-)
diff --git a/src/backend/access/transam/varsup.c b/src/backend/access/transam/varsup.c
index dc5e32d86f3..1bbc1bcd665 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -85,7 +85,7 @@ GetNewTransactionId(bool isSubXact)
{
Assert(!isSubXact);
MyProc->xid = BootstrapTransactionId;
- ProcGlobal->xids[MyProc->pgxactoff] = BootstrapTransactionId;
+ ProcGlobal->xids[ProcGetMyXactOff()] = BootstrapTransactionId;
return FullTransactionIdFromEpochAndXid(0, BootstrapTransactionId);
}
@@ -244,18 +244,18 @@ GetNewTransactionId(bool isSubXact)
*/
if (!isSubXact)
{
- Assert(ProcGlobal->subxidStates[MyProc->pgxactoff].count == 0);
- Assert(!ProcGlobal->subxidStates[MyProc->pgxactoff].overflowed);
+ Assert(ProcGlobal->subxidStates[ProcGetMyXactOff()].count == 0);
+ Assert(!ProcGlobal->subxidStates[ProcGetMyXactOff()].overflowed);
Assert(MyProc->subxidStatus.count == 0);
Assert(!MyProc->subxidStatus.overflowed);
/* LWLockRelease acts as barrier */
MyProc->xid = xid;
- ProcGlobal->xids[MyProc->pgxactoff] = xid;
+ ProcGlobal->xids[ProcGetMyXactOff()] = xid;
}
else
{
- XidCacheStatus *substat = &ProcGlobal->subxidStates[MyProc->pgxactoff];
+ XidCacheStatus *substat = &ProcGlobal->subxidStates[ProcGetMyXactOff()];
int nxids = MyProc->subxidStatus.count;
Assert(substat->count == MyProc->subxidStatus.count);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 9ab74c8df0a..d9e83e26c78 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4654,6 +4654,6 @@ set_indexsafe_procflags(void)
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->statusFlags |= PROC_IN_SAFE_IC;
- ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ ProcGlobal->statusFlags[ProcGetMyXactOff()] = MyProc->statusFlags;
LWLockRelease(ProcArrayLock);
}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index a4abb29cf64..884d18fad4e 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2059,7 +2059,7 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams params,
MyProc->statusFlags |= PROC_IN_VACUUM;
if (params.is_wraparound)
MyProc->statusFlags |= PROC_VACUUM_FOR_WRAPAROUND;
- ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ ProcGlobal->statusFlags[ProcGetMyXactOff()] = MyProc->statusFlags;
LWLockRelease(ProcArrayLock);
}
diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 3541fc793e4..c2c96a70c92 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -188,7 +188,7 @@ StartupDecodingContext(List *output_plugin_options,
{
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->statusFlags |= PROC_IN_LOGICAL_DECODING;
- ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ ProcGlobal->statusFlags[ProcGetMyXactOff()] = MyProc->statusFlags;
LWLockRelease(ProcArrayLock);
}
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index d7fb9f5a67f..1bd3b8f902b 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -832,7 +832,7 @@ ReplicationSlotRelease(void)
/* might not have been set when we've been a plain slot */
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->statusFlags &= ~PROC_IN_LOGICAL_DECODING;
- ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ ProcGlobal->statusFlags[ProcGetMyXactOff()] = MyProc->statusFlags;
LWLockRelease(ProcArrayLock);
if (am_walsender)
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 04aa770d981..748b47a15bf 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -358,7 +358,7 @@ InitWalSender(void)
Assert(MyProc->xmin == InvalidTransactionId);
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
MyProc->statusFlags |= PROC_AFFECTS_ALL_HORIZONS;
- ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ ProcGlobal->statusFlags[ProcGetMyXactOff()] = MyProc->statusFlags;
LWLockRelease(ProcArrayLock);
}
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index f540bb6b23f..585ed6d3677 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -283,6 +283,7 @@ typedef enum KAXCompressReason
} KAXCompressReason;
static PGPROC *allProcs;
+static int *pgxactoffs;
/*
* Cache to reduce overhead of repeated calls to TransactionIdIsInProgress()
@@ -449,6 +450,7 @@ ProcArrayShmemInit(void *arg)
TransamVariables->xactCompletionCount = 1;
allProcs = ProcGlobal->allProcs;
+ pgxactoffs = ProcGlobal->pgxactoffs;
}
static void
@@ -498,7 +500,7 @@ ProcArrayAdd(PGPROC *proc)
int this_procno = arrayP->pgprocnos[index];
Assert(this_procno >= 0 && this_procno < (arrayP->maxProcs + NUM_AUXILIARY_PROCS));
- Assert(allProcs[this_procno].pgxactoff == index);
+ Assert(pgxactoffs[this_procno] == index);
/* If we have found our right position in the array, break */
if (this_procno > pgprocno)
@@ -520,7 +522,7 @@ ProcArrayAdd(PGPROC *proc)
movecount * sizeof(*ProcGlobal->statusFlags));
arrayP->pgprocnos[index] = GetNumberFromPGProc(proc);
- proc->pgxactoff = index;
+ pgxactoffs[arrayP->pgprocnos[index]] = index;
ProcGlobal->xids[index] = proc->xid;
ProcGlobal->subxidStates[index] = proc->subxidStatus;
ProcGlobal->statusFlags[index] = proc->statusFlags;
@@ -534,9 +536,9 @@ ProcArrayAdd(PGPROC *proc)
int procno = arrayP->pgprocnos[index];
Assert(procno >= 0 && procno < (arrayP->maxProcs + NUM_AUXILIARY_PROCS));
- Assert(allProcs[procno].pgxactoff == index - 1);
+ Assert(pgxactoffs[procno] == index - 1);
- allProcs[procno].pgxactoff = index;
+ pgxactoffs[procno] = index;
}
/*
@@ -574,10 +576,10 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
- myoff = proc->pgxactoff;
+ myoff = ProcGetXactOff(GetNumberFromPGProc(proc));
Assert(myoff >= 0 && myoff < arrayP->numProcs);
- Assert(ProcGlobal->allProcs[arrayP->pgprocnos[myoff]].pgxactoff == myoff);
+ Assert(ProcGlobal->pgxactoffs[arrayP->pgprocnos[myoff]] == myoff);
if (TransactionIdIsValid(latestXid))
{
@@ -632,9 +634,9 @@ ProcArrayRemove(PGPROC *proc, TransactionId latestXid)
int procno = arrayP->pgprocnos[index];
Assert(procno >= 0 && procno < (arrayP->maxProcs + NUM_AUXILIARY_PROCS));
- Assert(allProcs[procno].pgxactoff - 1 == index);
+ Assert(pgxactoffs[procno] - 1 == index);
- allProcs[procno].pgxactoff = index;
+ pgxactoffs[procno] = index;
}
/*
@@ -706,11 +708,13 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
/* avoid unnecessarily dirtying shared cachelines */
if (proc->statusFlags & PROC_VACUUM_STATE_MASK)
{
+ int pgxactoff = ProcGetXactOff(GetNumberFromPGProc(proc));
+
Assert(!LWLockHeldByMe(ProcArrayLock));
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
- Assert(proc->statusFlags == ProcGlobal->statusFlags[proc->pgxactoff]);
+ Assert(proc->statusFlags == ProcGlobal->statusFlags[pgxactoff]);
proc->statusFlags &= ~PROC_VACUUM_STATE_MASK;
- ProcGlobal->statusFlags[proc->pgxactoff] = proc->statusFlags;
+ ProcGlobal->statusFlags[pgxactoff] = proc->statusFlags;
LWLockRelease(ProcArrayLock);
}
}
@@ -724,7 +728,7 @@ ProcArrayEndTransaction(PGPROC *proc, TransactionId latestXid)
static inline void
ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId latestXid)
{
- int pgxactoff = proc->pgxactoff;
+ int pgxactoff = ProcGetXactOff(GetNumberFromPGProc(proc));
/*
* Note: we need exclusive lock here because we're going to change other
@@ -747,7 +751,7 @@ ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId latestXid)
if (proc->statusFlags & PROC_VACUUM_STATE_MASK)
{
proc->statusFlags &= ~PROC_VACUUM_STATE_MASK;
- ProcGlobal->statusFlags[proc->pgxactoff] = proc->statusFlags;
+ ProcGlobal->statusFlags[pgxactoff] = proc->statusFlags;
}
/* Clear the subtransaction-XID cache too while holding the lock */
@@ -916,7 +920,7 @@ ProcArrayClearTransaction(PGPROC *proc)
*/
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
- pgxactoff = proc->pgxactoff;
+ pgxactoff = ProcGetXactOff(GetNumberFromPGProc(proc));
ProcGlobal->xids[pgxactoff] = InvalidTransactionId;
proc->xid = InvalidTransactionId;
@@ -1475,7 +1479,7 @@ TransactionIdIsInProgress(TransactionId xid)
}
/* No shortcuts, gotta grovel through the array */
- mypgxactoff = MyProc->pgxactoff;
+ mypgxactoff = ProcGetMyXactOff();
numProcs = arrayP->numProcs;
for (int pgxactoff = 0; pgxactoff < numProcs; pgxactoff++)
{
@@ -2176,7 +2180,7 @@ GetSnapshotData(Snapshot snapshot)
}
latest_completed = TransamVariables->latestCompletedXid;
- mypgxactoff = MyProc->pgxactoff;
+ mypgxactoff = ProcGetMyXactOff();
myxid = other_xids[mypgxactoff];
Assert(myxid == MyProc->xid);
@@ -2215,7 +2219,7 @@ GetSnapshotData(Snapshot snapshot)
TransactionId xid = UINT32_ACCESS_ONCE(other_xids[pgxactoff]);
uint8 statusFlags;
- Assert(allProcs[arrayP->pgprocnos[pgxactoff]].pgxactoff == pgxactoff);
+ Assert(pgxactoffs[arrayP->pgprocnos[pgxactoff]] == pgxactoff);
/*
* If the transaction has no XID assigned, we can skip it; it
@@ -2583,7 +2587,7 @@ ProcArrayInstallRestoredXmin(TransactionId xmin, PGPROC *proc)
MyProc->xmin = TransactionXmin = xmin;
MyProc->statusFlags = (MyProc->statusFlags & ~PROC_XMIN_FLAGS) |
(proc->statusFlags & PROC_XMIN_FLAGS);
- ProcGlobal->statusFlags[MyProc->pgxactoff] = MyProc->statusFlags;
+ ProcGlobal->statusFlags[ProcGetMyXactOff()] = MyProc->statusFlags;
result = true;
}
@@ -4011,7 +4015,7 @@ XidCacheRemoveRunningXids(TransactionId xid,
*/
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
- mysubxidstat = &ProcGlobal->subxidStates[MyProc->pgxactoff];
+ mysubxidstat = &ProcGlobal->subxidStates[ProcGetMyXactOff()];
/*
* Under normal circumstances xid and xids[] will be in increasing order,
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6fa9de33e1c..1a52bdb8b88 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -164,6 +164,7 @@ ProcGlobalShmemRequest(void *arg)
size = add_size(size, mul_size(TotalProcs, sizeof(PGPROC)));
size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->xids)));
size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->subxidStates)));
+ size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->pgxactoffs)));
size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->statusFlags)));
ProcGlobalAllProcsShmemSize = size;
ShmemRequestStruct(.name = "PGPROC structures",
@@ -270,6 +271,9 @@ ProcGlobalShmemInit(void *arg)
ProcGlobal->subxidStates = (XidCacheStatus *) ptr;
ptr = ptr + (TotalProcs * sizeof(*ProcGlobal->subxidStates));
+ ProcGlobal->pgxactoffs = (int *) ptr;
+ ptr = (char *) ptr + TotalProcs * sizeof(*ProcGlobal->pgxactoffs);
+
ProcGlobal->statusFlags = (uint8 *) ptr;
ptr = ptr + (TotalProcs * sizeof(*ProcGlobal->statusFlags));
@@ -1531,7 +1535,7 @@ ProcSleep(LOCALLOCK *locallock)
* the lock held, which is much more undesirable.
*/
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
- statusFlags = ProcGlobal->statusFlags[autovac->pgxactoff];
+ statusFlags = ProcGlobal->statusFlags[ProcGetXactOff(GetNumberFromPGProc(autovac))];
lockmethod_copy = lock->tag.locktag_lockmethodid;
locktag_copy = lock->tag;
LWLockRelease(ProcArrayLock);
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 3e1d1fad5f9..2c4d23cb116 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -210,9 +210,6 @@ typedef struct PGPROC
Oid tempNamespaceId; /* OID of temp schema this backend is
* using */
- int pgxactoff; /* offset into various ProcGlobal->arrays with
- * data mirrored from this PGPROC */
-
uint8 statusFlags; /* this backend's status flags, see PROC_*
* above. mirrored in
* ProcGlobal->statusFlags[pgxactoff] */
@@ -445,6 +442,7 @@ typedef struct PROC_HDR
{
/* Array of PGPROC structures (not including dummies for prepared txns) */
PGPROC *allProcs;
+ int *pgxactoffs;
/* Array mirroring PGPROC.xid for each PGPROC currently in the procarray */
TransactionId *xids;
@@ -509,6 +507,8 @@ extern PGDLLIMPORT PGPROC *PreparedXactProcs;
*/
#define GetPGProcByNumber(n) (&ProcGlobal->allProcs[(n)])
#define GetNumberFromPGProc(proc) ((proc) - &ProcGlobal->allProcs[0])
+#define ProcGetXactOff(procno) (ProcGlobal->pgxactoffs[(procno)])
+#define ProcGetMyXactOff() (ProcGetXactOff(GetNumberFromPGProc(MyProc)))
/*
* We set aside some extra PGPROC structures for "special worker" processes,
--
2.43.0