Hi, Vitaly! On Tue, May 20, 2025 at 6:44 PM Vitaly Davydov <v.davy...@postgrespro.ru> wrote: > > Thank you very much for the review! > > > The patchset doesn't seem to build after 371f2db8b0, which adjusted > > the signature of the INJECTION_POINT() macro. Could you, please, > > update the patchset accordingly. > > I've updated the patch (see attached). Thanks. > > > I see in 0004 patch we're calling XLogGetReplicationSlotMinimumLSN() > > before slots synchronization then use it for WAL truncation. > > Generally looks good. But what about the "if > > (InvalidateObsoleteReplicationSlots(...))" branch? It calls > > XLogGetReplicationSlotMinimumLSN() again. Why would the value > > obtained from the latter call reflect slots as they are synchronized > > to the disk? > > In patch 0004 I call XLogGetReplicationSlotMinimumLSN() again to keep the old > behaviour - this function was called in KeepLogSeg prior to my change. I also > call CheckPointReplicationSlots at the next line to save the invalidated and > other dirty slots on disk again to make sure, the new oldest LSN is in sync. > > The problem I tried to solve in this if-branch is to fix test > src/test/recovery/t/019_replslot_limit.pl which was failed because the WAL was > not truncated enought for the test to pass ok. In general, this branch is not > necessary and we may fix the test by calling checkpoint twice (please, see the > alternative.rej patch for this case). If you think, we should incorporate this > new change, I'm ok to do it. But the WAL will be truncating more lazily. > > Furthermore, I think we can save slots on disk right after invalidation, not > in > CheckPointGuts to avoid saving invalidated slots twice.
Thank you for the clarification. It's all good. I just missed that CheckPointReplicationSlots() syncs slots inside the "if" branch. I've reordered the patchset. Fix should come first, tests comes second. So, tests pass after each commit. Also I've joined both tests and injection points into single commit. I don't see reason to place tests into src/test/modules, because there is no module. I've moved them into src/test/recovery. I also improved some comments and commit messages. I think 0001 should go to all supported releases as it fixes material bug, while 0002 should be backpatched to 17, where injection points fist appears. 0003 should go to pg19 after branching. I'm continuing reviewing this. ------ Regards, Alexander Korotkov Supabase
From c409a441be6487063d49c2671d3a3aecb9ba6994 Mon Sep 17 00:00:00 2001 From: Vitaly Davydov <v.davydov@postgrespro.ru> Date: Wed, 30 Apr 2025 14:09:21 +0300 Subject: [PATCH v4 1/3] Keep WAL segments by the flushed value of the slot's restart LSN The patch fixes the issue with the unexpected removal of old WAL segments after checkpoint, followed by an immediate restart. The issue occurs when a slot is advanced after the start of the checkpoint and before old WAL segments are removed at the end of the checkpoint. The idea of the patch is to get the minimal restart_lsn at the beginning of checkpoint (or restart point) creation and use this value when calculating the oldest LSN for WAL segments removal at the end of checkpoint. This idea was proposed by Tomas Vondra in the discussion. Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497 Author: Vitaly Davydov <v.davydov@postgrespro.ru> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Backpatch-through: 13 --- src/backend/access/transam/xlog.c | 37 ++++++++++++++++++++++++------- 1 file changed, 29 insertions(+), 8 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 1914859b2ee..30ae65fce53 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -677,7 +677,8 @@ static XLogRecPtr CreateOverwriteContrecordRecord(XLogRecPtr aborted_lsn, XLogRecPtr pagePtr, TimeLineID newTLI); static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags); -static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo); +static void KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinLSN, + XLogSegNo *logSegNo); static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void); static void AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, @@ -7087,6 +7088,7 @@ CreateCheckPoint(int flags) VirtualTransactionId *vxids; int nvxids; int oldXLogAllowed = 0; + XLogRecPtr slotsMinReqLSN; /* * An end-of-recovery checkpoint is really a shutdown checkpoint, just @@ -7315,6 +7317,11 @@ CreateCheckPoint(int flags) */ END_CRIT_SECTION(); + /* + * Get the current minimum LSN to be used later in WAL segments cleanup. + */ + slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN(); + /* * In some cases there are groups of actions that must all occur on one * side or the other of a checkpoint record. Before flushing the @@ -7503,17 +7510,20 @@ CreateCheckPoint(int flags) * prevent the disk holding the xlog from growing full. */ XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); - KeepLogSeg(recptr, &_logSegNo); + KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo); if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT, _logSegNo, InvalidOid, InvalidTransactionId)) { + slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN(); + CheckPointReplicationSlots(shutdown); + /* * Some slots have been invalidated; recalculate the old-segment * horizon, starting again from RedoRecPtr. */ XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); - KeepLogSeg(recptr, &_logSegNo); + KeepLogSeg(recptr, slotsMinReqLSN, &_logSegNo); } _logSegNo--; RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr, @@ -7788,6 +7798,7 @@ CreateRestartPoint(int flags) XLogRecPtr endptr; XLogSegNo _logSegNo; TimestampTz xtime; + XLogRecPtr slotsMinReqLSN; /* Concurrent checkpoint/restartpoint cannot happen */ Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER); @@ -7870,6 +7881,11 @@ CreateRestartPoint(int flags) MemSet(&CheckpointStats, 0, sizeof(CheckpointStats)); CheckpointStats.ckpt_start_t = GetCurrentTimestamp(); + /* + * Get the current minimum LSN to be used later in WAL segments cleanup. + */ + slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN(); + if (log_checkpoints) LogCheckpointStart(flags, true); @@ -7958,17 +7974,20 @@ CreateRestartPoint(int flags) receivePtr = GetWalRcvFlushRecPtr(NULL, NULL); replayPtr = GetXLogReplayRecPtr(&replayTLI); endptr = (receivePtr < replayPtr) ? replayPtr : receivePtr; - KeepLogSeg(endptr, &_logSegNo); + KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo); if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT, _logSegNo, InvalidOid, InvalidTransactionId)) { + slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN(); + CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN); + /* * Some slots have been invalidated; recalculate the old-segment * horizon, starting again from RedoRecPtr. */ XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size); - KeepLogSeg(endptr, &_logSegNo); + KeepLogSeg(endptr, slotsMinReqLSN, &_logSegNo); } _logSegNo--; @@ -8063,6 +8082,7 @@ GetWALAvailability(XLogRecPtr targetLSN) XLogSegNo oldestSegMaxWalSize; /* oldest segid kept by max_wal_size */ XLogSegNo oldestSlotSeg; /* oldest segid kept by slot */ uint64 keepSegs; + XLogRecPtr slotsMinReqLSN; /* * slot does not reserve WAL. Either deactivated, or has never been active @@ -8076,8 +8096,9 @@ GetWALAvailability(XLogRecPtr targetLSN) * oldestSlotSeg to the current segment. */ currpos = GetXLogWriteRecPtr(); + slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN(); XLByteToSeg(currpos, oldestSlotSeg, wal_segment_size); - KeepLogSeg(currpos, &oldestSlotSeg); + KeepLogSeg(currpos, slotsMinReqLSN, &oldestSlotSeg); /* * Find the oldest extant segment file. We get 1 until checkpoint removes @@ -8138,7 +8159,7 @@ GetWALAvailability(XLogRecPtr targetLSN) * invalidation is optionally done here, instead. */ static void -KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo) +KeepLogSeg(XLogRecPtr recptr, XLogRecPtr slotsMinReqLSN, XLogSegNo *logSegNo) { XLogSegNo currSegNo; XLogSegNo segno; @@ -8151,7 +8172,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo) * Calculate how many segments are kept by slots first, adjusting for * max_slot_wal_keep_size. */ - keep = XLogGetReplicationSlotMinimumLSN(); + keep = slotsMinReqLSN; if (keep != InvalidXLogRecPtr && keep < recptr) { XLByteToSeg(keep, segno, wal_segment_size); -- 2.39.5 (Apple Git-154)
From 6f2afd8239cdefc62a519825a670db2eb8a4e111 Mon Sep 17 00:00:00 2001 From: Tomas Vondra <tomas@vondra.me> Date: Thu, 21 Nov 2024 20:37:00 +0100 Subject: [PATCH v4 2/3] Add TAP tests to check replication slot advance during the checkpoint The new tests verify that logical and physical replication slots are still valid after an immediate restart on checkpoint completion when the slot was advanced during the checkpoint. This commit introduces two new injection points to make these tests possible: * checkpoint-before-old-wal-removal - triggered in the checkpointer process just before old WAL segments cleanup; * logical-replication-slot-advance-segment - triggered in LogicalConfirmReceivedLocation() when restart_lsn was changed enough to point to the next WAL segment. Discussion: https://postgr.es/m/flat/1d12d2-67235980-35-19a406a0%4063439497 Author: Vitaly Davydov <v.davydov@postgrespro.ru> Author: Tomas Vondra <tomas@vondra.me> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Backpatch-through: 17 --- src/backend/access/transam/xlog.c | 4 + src/backend/replication/logical/logical.c | 18 +++ .../test_replslot_required_lsn/Makefile | 18 +++ .../test_replslot_required_lsn/meson.build | 16 ++ src/test/recovery/meson.build | 2 + src/test/recovery/t/046_logical_slot.pl | 139 ++++++++++++++++++ src/test/recovery/t/047_physical_slot.pl | 136 +++++++++++++++++ 7 files changed, 333 insertions(+) create mode 100644 src/test/modules/test_replslot_required_lsn/Makefile create mode 100644 src/test/modules/test_replslot_required_lsn/meson.build create mode 100644 src/test/recovery/t/046_logical_slot.pl create mode 100644 src/test/recovery/t/047_physical_slot.pl diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 30ae65fce53..9c0f9a0af28 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7505,6 +7505,10 @@ CreateCheckPoint(int flags) if (PriorRedoPtr != InvalidXLogRecPtr) UpdateCheckPointDistanceEstimate(RedoRecPtr - PriorRedoPtr); +#ifdef USE_INJECTION_POINTS + INJECTION_POINT("checkpoint-before-old-wal-removal", NULL); +#endif + /* * Delete old log files, those no longer needed for last checkpoint to * prevent the disk holding the xlog from growing full. diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c index 1d56d0c4ef3..f1eb798f3e9 100644 --- a/src/backend/replication/logical/logical.c +++ b/src/backend/replication/logical/logical.c @@ -29,6 +29,7 @@ #include "postgres.h" #include "access/xact.h" +#include "access/xlog_internal.h" #include "access/xlogutils.h" #include "fmgr.h" #include "miscadmin.h" @@ -41,6 +42,7 @@ #include "storage/proc.h" #include "storage/procarray.h" #include "utils/builtins.h" +#include "utils/injection_point.h" #include "utils/inval.h" #include "utils/memutils.h" @@ -1825,9 +1827,13 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn) { bool updated_xmin = false; bool updated_restart = false; + XLogRecPtr restart_lsn pg_attribute_unused(); SpinLockAcquire(&MyReplicationSlot->mutex); + /* remember the old restart lsn */ + restart_lsn = MyReplicationSlot->data.restart_lsn; + /* * Prevent moving the confirmed_flush backwards, as this could lead to * data duplication issues caused by replicating already replicated @@ -1881,6 +1887,18 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn) /* first write new xmin to disk, so we know what's up after a crash */ if (updated_xmin || updated_restart) { +#ifdef USE_INJECTION_POINTS + XLogSegNo seg1, + seg2; + + XLByteToSeg(restart_lsn, seg1, wal_segment_size); + XLByteToSeg(MyReplicationSlot->data.restart_lsn, seg2, wal_segment_size); + + /* trigger injection point, but only if segment changes */ + if (seg1 != seg2) + INJECTION_POINT("logical-replication-slot-advance-segment", NULL); +#endif + ReplicationSlotMarkDirty(); ReplicationSlotSave(); elog(DEBUG1, "updated xmin: %u restart: %u", updated_xmin, updated_restart); diff --git a/src/test/modules/test_replslot_required_lsn/Makefile b/src/test/modules/test_replslot_required_lsn/Makefile new file mode 100644 index 00000000000..e5ff8af255b --- /dev/null +++ b/src/test/modules/test_replslot_required_lsn/Makefile @@ -0,0 +1,18 @@ +# src/test/modules/test_replslot_required_lsn/Makefile + +EXTRA_INSTALL=src/test/modules/injection_points \ + contrib/test_decoding + +export enable_injection_points +TAP_TESTS = 1 + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = src/test/modules/test_replslot_required_lsn +top_builddir = ../../../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build new file mode 100644 index 00000000000..44d2546632b --- /dev/null +++ b/src/test/modules/test_replslot_required_lsn/meson.build @@ -0,0 +1,16 @@ +# Copyright (c) 2025, PostgreSQL Global Development Group + +tests += { + 'name': 'test_replslot_required_lsn', + 'sd': meson.current_source_dir(), + 'bd': meson.current_build_dir(), + 'tap': { + 'env': { + 'enable_injection_points': get_option('injection_points') ? 'yes' : 'no', + }, + 'tests': [ + 't/001_logical_slot.pl', + 't/002_physical_slot.pl' + ], + }, +} diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build index cb983766c67..5ee41c3cd4d 100644 --- a/src/test/recovery/meson.build +++ b/src/test/recovery/meson.build @@ -54,6 +54,8 @@ tests += { 't/043_no_contrecord_switch.pl', 't/044_invalidate_inactive_slots.pl', 't/045_archive_restartpoint.pl', + 't/046_logical_slot.pl', + 't/047_physical_slot.pl' ], }, } diff --git a/src/test/recovery/t/046_logical_slot.pl b/src/test/recovery/t/046_logical_slot.pl new file mode 100644 index 00000000000..e78375178aa --- /dev/null +++ b/src/test/recovery/t/046_logical_slot.pl @@ -0,0 +1,139 @@ +# Copyright (c) 2025, PostgreSQL Global Development Group +# +# This test verifies the case when the logical slot is advanced during +# checkpoint. The test checks that the logical slot's restart_lsn still refers +# to an existed WAL segment after immediate restart. +# +# Discussion: +# https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497 +# +use strict; +use warnings FATAL => 'all'; + +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; + +use Test::More; + +if ($ENV{enable_injection_points} ne 'yes') +{ + plan skip_all => 'Injection points not supported by this build'; +} + +my ($node, $result); + +$node = PostgreSQL::Test::Cluster->new('mike'); +$node->init; +$node->append_conf('postgresql.conf', + "shared_preload_libraries = 'injection_points'"); +$node->append_conf('postgresql.conf', "wal_level = 'logical'"); +$node->start; +$node->safe_psql('postgres', q(CREATE EXTENSION injection_points)); + +# create a simple table to generate data into +$node->safe_psql('postgres', + q{create table t (id serial primary key, b text)}); + +# create the two slots we'll need +$node->safe_psql('postgres', + q{select pg_create_logical_replication_slot('slot_logical', 'test_decoding')} +); +$node->safe_psql('postgres', + q{select pg_create_physical_replication_slot('slot_physical', true)}); + +# advance both to current position, just to have everything "valid" +$node->safe_psql('postgres', + q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null)} +); +$node->safe_psql('postgres', + q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())} +); + +# run checkpoint, to flush current state to disk and set a baseline +$node->safe_psql('postgres', q{checkpoint}); + +# generate transactions to get RUNNING_XACTS +my $xacts = $node->background_psql('postgres'); +$xacts->query_until( + qr/run_xacts/, + q(\echo run_xacts +SELECT 1 \watch 0.1 +\q +)); + +# insert 2M rows, that's about 260MB (~20 segments) worth of WAL +$node->safe_psql('postgres', + q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)} +); + +# run another checkpoint, to set a new restore LSN +$node->safe_psql('postgres', q{checkpoint}); + +# another 2M rows, that's about 260MB (~20 segments) worth of WAL +$node->safe_psql('postgres', + q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)} +); + +# run another checkpoint, this time in the background, and make it wait +# on the injection point), so that the checkpoint stops right before +# removing old WAL segments +print('starting checkpoint\n'); + +my $checkpoint = $node->background_psql('postgres'); +$checkpoint->query_safe( + q(select injection_points_attach('checkpoint-before-old-wal-removal','wait')) +); +$checkpoint->query_until( + qr/starting_checkpoint/, + q(\echo starting_checkpoint +checkpoint; +\q +)); + +print('waiting for injection_point\n'); +# wait until the checkpoint stops right before removing WAL segments +$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal'); + + +# try to advance the logical slot, but make it stop when it moves to the +# next WAL segment (has to happen in the background too) +my $logical = $node->background_psql('postgres'); +$logical->query_safe( + q{select injection_points_attach('logical-replication-slot-advance-segment','wait');} +); +$logical->query_until( + qr/get_changes/, + q( +\echo get_changes +select count(*) from pg_logical_slot_get_changes('slot_logical', null, null) \watch 1 +\q +)); + +# wait until the checkpoint stops right before removing WAL segments +$node->wait_for_event('client backend', + 'logical-replication-slot-advance-segment'); + +# OK, we're in the right situation, time to advance the physical slot, +# which recalculates the required LSN, and then unblock the checkpoint, +# which removes the WAL still needed by the logical slot +$node->safe_psql('postgres', + q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())} +); + +$node->safe_psql('postgres', + q{select injection_points_wakeup('checkpoint-before-old-wal-removal')}); + +# abruptly stop the server (1 second should be enough for the checkpoint +# to finish, would be better ) +$node->stop('immediate'); + +$node->start; + +eval { + $node->safe_psql('postgres', + q{select count(*) from pg_logical_slot_get_changes('slot_logical', null, null);} + ); +}; +is($@, '', "Logical slot still valid"); + +done_testing(); diff --git a/src/test/recovery/t/047_physical_slot.pl b/src/test/recovery/t/047_physical_slot.pl new file mode 100644 index 00000000000..f2cf096b308 --- /dev/null +++ b/src/test/recovery/t/047_physical_slot.pl @@ -0,0 +1,136 @@ +# Copyright (c) 2025, PostgreSQL Global Development Group +# +# This test verifies the case when the physical slot is advanced during +# checkpoint. The test checks that the physical slot's restart_lsn still refers +# to an existed WAL segment after immediate restart. +# +# Discussion: +# https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497 +# +use strict; +use warnings FATAL => 'all'; + +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; + +use Test::More; + +if ($ENV{enable_injection_points} ne 'yes') +{ + plan skip_all => 'Injection points not supported by this build'; +} + +my ($node, $result); + +$node = PostgreSQL::Test::Cluster->new('mike'); +$node->init(); +$node->append_conf('postgresql.conf', + "shared_preload_libraries = 'injection_points'"); +$node->append_conf('postgresql.conf', "wal_level = 'replica'"); +$node->start(); +$node->safe_psql('postgres', q(CREATE EXTENSION injection_points)); + +# create a simple table to generate data into +$node->safe_psql('postgres', + q{create table t (id serial primary key, b text)}); + +# create a physical replication slot +$node->safe_psql('postgres', + q{select pg_create_physical_replication_slot('slot_physical', true)}); + +# advance slot to current position, just to have everything "valid" +$node->safe_psql('postgres', + q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())} +); + +# run checkpoint, to flush current state to disk and set a baseline +$node->safe_psql('postgres', q{checkpoint}); + +# insert 2M rows, that's about 260MB (~20 segments) worth of WAL +$node->safe_psql('postgres', + q{insert into t (b) select md5(i::text) from generate_series(1,100000) s(i)} +); + +# advance slot to current position, just to have everything "valid" +$node->safe_psql('postgres', + q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())} +); + +# run another checkpoint, to set a new restore LSN +$node->safe_psql('postgres', q{checkpoint}); + +# another 2M rows, that's about 260MB (~20 segments) worth of WAL +$node->safe_psql('postgres', + q{insert into t (b) select md5(i::text) from generate_series(1,1000000) s(i)} +); + +my $restart_lsn_init = $node->safe_psql('postgres', + q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'} +); +chomp($restart_lsn_init); +note("restart lsn before checkpoint: $restart_lsn_init"); + +# run another checkpoint, this time in the background, and make it wait +# on the injection point), so that the checkpoint stops right before +# removing old WAL segments +note('starting checkpoint'); + +my $checkpoint = $node->background_psql('postgres'); +$checkpoint->query_safe( + q{select injection_points_attach('checkpoint-before-old-wal-removal','wait')} +); +$checkpoint->query_until( + qr/starting_checkpoint/, + q(\echo starting_checkpoint +checkpoint; +\q +)); + +# wait until the checkpoint stops right before removing WAL segments +note('waiting for injection_point'); +$node->wait_for_event('checkpointer', 'checkpoint-before-old-wal-removal'); +note('injection_point is reached'); + +# OK, we're in the right situation, time to advance the physical slot, +# which recalculates the required LSN, and then unblock the checkpoint, +# which removes the WAL still needed by the logical slot +$node->safe_psql('postgres', + q{select pg_replication_slot_advance('slot_physical', pg_current_wal_lsn())} +); + +# Continue checkpoint +$node->safe_psql('postgres', + q{select injection_points_wakeup('checkpoint-before-old-wal-removal')}); + +my $restart_lsn_old = $node->safe_psql('postgres', + q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'} +); +chomp($restart_lsn_old); +note("restart lsn before stop: $restart_lsn_old"); + +# abruptly stop the server (1 second should be enough for the checkpoint +# to finish, would be better ) +$node->stop('immediate'); + +$node->start; + +# Get the restart_lsn of the slot right after restarting +my $restart_lsn = $node->safe_psql('postgres', + q{select restart_lsn from pg_replication_slots where slot_name = 'slot_physical'} +); +chomp($restart_lsn); +note("restart lsn: $restart_lsn"); + +# Get wal segment name for slot's restart_lsn +my $restart_lsn_segment = $node->safe_psql('postgres', + "SELECT pg_walfile_name('$restart_lsn'::pg_lsn)"); +chomp($restart_lsn_segment); + +# Check if the required wal segment exists +note("required by slot segment name: $restart_lsn_segment"); +my $datadir = $node->data_dir; +ok( -f "$datadir/pg_wal/$restart_lsn_segment", + "WAL segment $restart_lsn_segment for physical slot's restart_lsn $restart_lsn exists" +); + +done_testing(); -- 2.39.5 (Apple Git-154)
From cf5b10f1cbb400e7ff0e596fe962af5847e96c2e Mon Sep 17 00:00:00 2001 From: Vitaly Davydov <v.davydov@postgrespro.ru> Date: Thu, 1 May 2025 12:18:52 +0300 Subject: [PATCH v4 3/3] Remove redundant ReplicationSlotsComputeRequiredLSN calls The function ReplicationSlotsComputeRequiredLSN is used to calculate the oldest slots' required LSN. It is called every time when restart_lsn value of any slot is changed (for example, when a slot is advancing). The slot's oldest required LSN is used to remote old WAL segments in two places - when checkpoint or restart point is created (CreateCheckPoint, CreateRestartPoint functions). Old WAL segments seems to be truncated in these two functions only. The idea of the patch is to call ReplicationSlotsComputeRequiredLSN in CreateCheckPoint or CreateRestartPoint functions only, before call of RemoveOldXlogFiles function where old WAL segments are removed. There is no obvious need to recalculate oldest required LSN every time when a slot's restart_lsn is changed. The value of the oldest required lsn can affect on slot invalidation. The function InvalidateObsoleteReplicationSlots with non zero second parameter (oldestSegno) is called in CreateCheckPoint, CreateRestartPoint functions only where slot invalidation occurs with reason RS_INVAL_WAL_REMOVED. Once we update the oldest slots' required lsn in the beginning of these functions, the proposed patch should not break the behaviour of slot invalidation function in this case. --- src/backend/access/transam/xlog.c | 4 ++++ src/backend/replication/logical/logical.c | 1 - src/backend/replication/logical/slotsync.c | 4 ---- src/backend/replication/slot.c | 5 ----- src/backend/replication/slotfuncs.c | 2 -- src/backend/replication/walsender.c | 1 - 6 files changed, 4 insertions(+), 13 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 9c0f9a0af28..624be87a609 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -7320,6 +7320,7 @@ CreateCheckPoint(int flags) /* * Get the current minimum LSN to be used later in WAL segments cleanup. */ + ReplicationSlotsComputeRequiredLSN(); slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN(); /* @@ -7519,6 +7520,7 @@ CreateCheckPoint(int flags) _logSegNo, InvalidOid, InvalidTransactionId)) { + ReplicationSlotsComputeRequiredLSN(); slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN(); CheckPointReplicationSlots(shutdown); @@ -7888,6 +7890,7 @@ CreateRestartPoint(int flags) /* * Get the current minimum LSN to be used later in WAL segments cleanup. */ + ReplicationSlotsComputeRequiredLSN(); slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN(); if (log_checkpoints) @@ -7983,6 +7986,7 @@ CreateRestartPoint(int flags) _logSegNo, InvalidOid, InvalidTransactionId)) { + ReplicationSlotsComputeRequiredLSN(); slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN(); CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN); diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c index f1eb798f3e9..7d136213777 100644 --- a/src/backend/replication/logical/logical.c +++ b/src/backend/replication/logical/logical.c @@ -1917,7 +1917,6 @@ LogicalConfirmReceivedLocation(XLogRecPtr lsn) SpinLockRelease(&MyReplicationSlot->mutex); ReplicationSlotsComputeRequiredXmin(false); - ReplicationSlotsComputeRequiredLSN(); } } else diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c index 656e66e0ae0..30662c09275 100644 --- a/src/backend/replication/logical/slotsync.c +++ b/src/backend/replication/logical/slotsync.c @@ -335,7 +335,6 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid, SpinLockRelease(&slot->mutex); ReplicationSlotsComputeRequiredXmin(false); - ReplicationSlotsComputeRequiredLSN(); } return updated_config || updated_xmin_or_lsn; @@ -502,9 +501,6 @@ reserve_wal_for_local_slot(XLogRecPtr restart_lsn) slot->data.restart_lsn = restart_lsn; SpinLockRelease(&slot->mutex); - /* Prevent WAL removal as fast as possible */ - ReplicationSlotsComputeRequiredLSN(); - XLByteToSeg(slot->data.restart_lsn, segno, wal_segment_size); /* diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index 600b87fa9cb..dd18fe10f7d 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -1008,7 +1008,6 @@ ReplicationSlotDropPtr(ReplicationSlot *slot) * limits. */ ReplicationSlotsComputeRequiredXmin(false); - ReplicationSlotsComputeRequiredLSN(); /* * If removing the directory fails, the worst thing that will happen is @@ -1494,9 +1493,6 @@ ReplicationSlotReserveWal(void) slot->data.restart_lsn = restart_lsn; SpinLockRelease(&slot->mutex); - /* prevent WAL removal as fast as possible */ - ReplicationSlotsComputeRequiredLSN(); - /* * If all required WAL is still there, great, otherwise retry. The * slot should prevent further removal of WAL, unless there's a @@ -2014,7 +2010,6 @@ restart: if (invalidated) { ReplicationSlotsComputeRequiredXmin(false); - ReplicationSlotsComputeRequiredLSN(); } return invalidated; diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 36cc2ed4e44..3300fb9b1c9 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -583,7 +583,6 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS) * advancing potentially done. */ ReplicationSlotsComputeRequiredXmin(false); - ReplicationSlotsComputeRequiredLSN(); ReplicationSlotRelease(); @@ -819,7 +818,6 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot) ReplicationSlotMarkDirty(); ReplicationSlotsComputeRequiredXmin(false); - ReplicationSlotsComputeRequiredLSN(); ReplicationSlotSave(); #ifdef USE_ASSERT_CHECKING diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 9fa8beb6103..0767c2803d9 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -2384,7 +2384,6 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn) if (changed) { ReplicationSlotMarkDirty(); - ReplicationSlotsComputeRequiredLSN(); PhysicalWakeupLogicalWalSnd(); } -- 2.39.5 (Apple Git-154)