Hi, Right now replication slots are synced close to the beginning of CheckpointGuts(). Importantly, before CheckPointBuffers() which for spread checkpoints might take most of checkpoint_timeout to complete. This is a problem because this function calculates how much WAL to keep around in ReplicationSlotsComputeRequiredLSN(). By the time RemoveOldXlogFiles() gets called this information might be quite stale and we hold onto many WAL files unnecessarily until the next checkpoint cycle.
As far as I could tell there is no reason for this to happen early, so in the attached patched I just moved it down closer to the end. Regards, Ants Aasma
From 2878337eb8399c8c5a440b110ce6da4b318acc8c Mon Sep 17 00:00:00 2001 From: Ants Aasma <[email protected]> Date: Thu, 7 May 2026 13:13:47 +0300 Subject: [PATCH] Checkpoint replication slots late in the cycle Syncing replication slots computes XLogCtl->replicationSlotMinLSN which is used to decide how much WAL to retain. Currently this happens at the start of the checkpoint cycle, whereas WAL cleanup happens at the end. For spread checkpoints the information might be considerably stale and we hold onto too much WAL. Postponing the replication slot sync to the end of checkpoint makes WAL cleanup use much more recent information. --- src/backend/access/transam/xlog.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index f0434da40c9..6e6f411f47f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8046,7 +8046,6 @@ static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags) { CheckPointRelationMap(); - CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN); CheckPointSnapBuild(); CheckPointLogicalRewriteHeap(); CheckPointReplicationOrigin(); @@ -8068,7 +8067,11 @@ CheckPointGuts(XLogRecPtr checkPointRedo, int flags) CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp(); TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE(); - /* We deliberately delay 2PC checkpointing as long as possible */ + /* + * We deliberately delay checkpointing of replication slots and 2PC for + * as long as possible. + */ + CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN); CheckPointTwoPhase(checkPointRedo); } -- 2.51.0
