Hello all, I've made a v2 of this patch, turning it into a patchset with guidance from Fabrizio Mello.
This patchset includes a new feature that self-heals (auto revalidates) physical replication slots after they have been invalidated for two reasons: RS_INVAL_WAL_REMOVED or RS_INVAL_IDLE_TIMEOUT. Requiring an user to manually recreate slot isn't necessary in cases where the standby server connected to these slots recovers itself using restore_command and can become burdensome when managing a fleet of clusters, creating a need to handle this kind of problem automatically due to the scale of your operation. The patch adds a opt-in mechanism that allows the physical slots to be reinvalidated in those cases, a new persistent field called `auto_revalidate` (default false) controls which physical slots are eligible. When enabled, StartReplication issues a WARNING instead of an ERROR when acquiring physical invalidated slots and PhysicalConfirmReceivedLocation clears the invalidation atomically with the restart_lsn update upon the first flush ACK. The revalidation is persisted to disk immediately so it survives a crash. Only RS_INVAL_WAL_REMOVED and RS_INVAL_IDLE_TIMEOUT revalidatable, via an explicit allowlist in SlotCanBeRevalidated(). Future invalidation reasons must be added there to become eligible. I appreciate Fabrizio's help reviewing everything and walking me through my questions. The series is split into five patches: 0001 - Core infrastructure: SlotCanBeRevalidated helper, SlotIsValid macro, revalidation logic in walsender.c, SLOT_VERSION bump. 0002 - SQL function: new auto_revalidate parameter on pg_create_physical_replication_slot(), copy-path propagation via pg_copy_physical_replication_slot(), regression test. 0003 - View exposure: auto_revalidate column in pg_replication_slots. 0004 - TAP recovery test: six scenarios covering revalidation, WAL retention, xmin recovery, error preservation for auto_revalidate=false, slot copy revalidation, and idle_timeout revalidation (some of these require injection_points). 0005 - Documentation: system-views.sgml and func-admin.sgml. João Foltran Linkedin: https://www.linkedin.com/in/joao-foltran-031b9312b On Thu, Jan 22, 2026 at 4:41 PM Joao Foltran <[email protected]> wrote: > > Hi Amit! > > Unless we have hot_standby_feedback = on, xmin would be null on the > physical replication slot. > > But, even if using that parameter, as long as we know that the standby > already has caught up by using the archived wals then the xmin > wouldn't matter, since we don't need those rows to be visible anymore. > > I've attached a simple patch and test here that revalidates the slot > after it is lost. It is still missing any filtering besides checking > if the slot is physical or logical, but we can add filters for > specific invalidations. > > Let me know what you think. > > Regards, > João Foltran > > On Wed, Jan 14, 2026 at 8:21 AM Amit Kapila <[email protected]> wrote: > > > > On Tue, Jan 6, 2026 at 3:26 AM Joao Foltran <[email protected]> wrote: > > > > > > > The slots could be invalidated due to other reasons like > > > > RS_INVAL_IDLE_TIMEOUT as well. > > > > > > We could just filter which invalidation reasons could be "revalidated" > > > for only reasons that can be resolved this way. > > > > > > > Can we make the slot valid even the required WAL is made available > > afterwards? What about the removed rows due to the slot's xmin? > > > > -- > > With Regards, > > Amit Kapila.
From 9ae746d88004993735e85aea6eb096e23d7cb705 Mon Sep 17 00:00:00 2001 From: Joao Foltran <[email protected]> Date: Thu, 26 Mar 2026 16:08:58 -0300 Subject: [PATCH v2 5/5] Add documentation for auto_revalidate Document the auto_revalidate column in pg_replication_slots view (system-views.sgml) and the new optional parameter in pg_create_physical_replication_slot (func-admin.sgml). --- doc/src/sgml/func/func-admin.sgml | 8 ++++++-- doc/src/sgml/system-views.sgml | 11 +++++++++++ 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/func/func-admin.sgml b/doc/src/sgml/func/func-admin.sgml index 210b1118bdf..c5f9f9e13c1 100644 --- a/doc/src/sgml/func/func-admin.sgml +++ b/doc/src/sgml/func/func-admin.sgml @@ -1032,7 +1032,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset <indexterm> <primary>pg_create_physical_replication_slot</primary> </indexterm> - <function>pg_create_physical_replication_slot</function> ( <parameter>slot_name</parameter> <type>name</type> <optional>, <parameter>immediately_reserve</parameter> <type>boolean</type>, <parameter>temporary</parameter> <type>boolean</type> </optional> ) + <function>pg_create_physical_replication_slot</function> ( <parameter>slot_name</parameter> <type>name</type> <optional>, <parameter>immediately_reserve</parameter> <type>boolean</type>, <parameter>temporary</parameter> <type>boolean</type>, <parameter>auto_revalidate</parameter> <type>boolean</type> </optional> ) <returnvalue>record</returnvalue> ( <parameter>slot_name</parameter> <type>name</type>, <parameter>lsn</parameter> <type>pg_lsn</type> ) @@ -1051,7 +1051,11 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset parameter, <parameter>temporary</parameter>, when set to true, specifies that the slot should not be permanently stored to disk and is only meant for use by the current session. Temporary slots are also - released upon any error. This function corresponds + released upon any error. The optional fourth parameter, + <parameter>auto_revalidate</parameter>, when set to true, specifies + that the slot may be automatically revalidated after invalidation + once the standby reconnects and confirms WAL receipt. + This function corresponds to the replication protocol command <literal>CREATE_REPLICATION_SLOT ... PHYSICAL</literal>. </para></entry> diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index 9ee1a2bfc6a..f6540776bd5 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -3131,6 +3131,17 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx </para></entry> </row> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>auto_revalidate</structfield> <type>bool</type> + </para> + <para> + True if this physical slot is eligible for automatic revalidation + after invalidation, once the standby reconnects and confirms WAL + receipt. Always false for logical slots. + </para></entry> + </row> + <row> <entry role="catalog_table_entry"><para role="column_definition"> <structfield>slotsync_skip_reason</structfield><type>text</type> -- 2.50.1 (Apple Git-155)
From 1f70d65b4c62df158faed00f6deb10fa532556be Mon Sep 17 00:00:00 2001 From: Joao Foltran <[email protected]> Date: Thu, 19 Mar 2026 12:30:06 -0300 Subject: [PATCH v2 1/5] Add auto-revalidation infrastructure for physical replication slots Physical replication slots that are invalidated (e.g., due to WAL removal or idle timeout) currently cannot be reacquired, requiring manual slot recreation. This patch adds the infrastructure for automatic revalidation of physical slots after a standby reconnects and confirms WAL receipt. A new per-slot persistent field 'auto_revalidate' (default: false) controls whether a physical slot is eligible for revalidation. When enabled, the slot can be acquired despite being invalidated, and the invalidation is cleared atomically (under spinlock) with the restart_lsn update upon receiving the first flush ACK from the standby. Only RS_INVAL_WAL_REMOVED and RS_INVAL_IDLE_TIMEOUT are revalidatable via an explicit allowlist, so future invalidation reasons are not automatically eligible. This patch adds the field and revalidation logic but does not yet provide a way to set auto_revalidate=true; that will be added in a subsequent patch. Bump SLOT_VERSION from 5 to 6 for the new persistent field. --- src/backend/replication/slot.c | 2 +- src/backend/replication/walsender.c | 52 ++++++++++++++++++++++++++++- src/include/replication/slot.h | 24 +++++++++++++ 3 files changed, 76 insertions(+), 2 deletions(-) diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c index a9092fc2382..88113fc0cbc 100644 --- a/src/backend/replication/slot.c +++ b/src/backend/replication/slot.c @@ -140,7 +140,7 @@ StaticAssertDecl(lengthof(SlotInvalidationCauses) == (RS_INVAL_MAX_CAUSES + 1), sizeof(ReplicationSlotOnDisk) - ReplicationSlotOnDiskConstantSize #define SLOT_MAGIC 0x1051CA1 /* format identifier */ -#define SLOT_VERSION 5 /* version for new files */ +#define SLOT_VERSION 6 /* version for new files */ /* Control array for replication slot management */ ReplicationSlotCtlData *ReplicationSlotCtl = NULL; diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 08253103cb3..50ada2ed293 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -843,12 +843,34 @@ StartReplication(StartReplicationCmd *cmd) if (cmd->slotname) { - ReplicationSlotAcquire(cmd->slotname, true, true); + ReplicationSlotAcquire(cmd->slotname, true, false); if (SlotIsLogical(MyReplicationSlot)) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("cannot use a logical replication slot for physical replication"))); + /* + * Check if the slot is invalidated. Physical slots with + * auto_revalidate can proceed -- they will be revalidated once the + * standby confirms WAL receipt. All other invalidated slots must + * error out as before. + */ + if (!SlotIsValid(MyReplicationSlot)) + { + if (SlotCanBeRevalidated(MyReplicationSlot)) + ereport(WARNING, + errmsg("replication slot \"%s\" is invalidated due to \"%s\", will attempt revalidation", + NameStr(MyReplicationSlot->data.name), + GetSlotInvalidationCauseName(MyReplicationSlot->data.invalidated))); + else + ereport(ERROR, + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("can no longer access replication slot \"%s\"", + NameStr(MyReplicationSlot->data.name)), + errdetail("This replication slot has been invalidated due to \"%s\".", + GetSlotInvalidationCauseName(MyReplicationSlot->data.invalidated))); + } + /* * We don't need to verify the slot's restart_lsn here; instead we * rely on the caller requesting the starting point to use. If the @@ -2429,6 +2451,7 @@ static void PhysicalConfirmReceivedLocation(XLogRecPtr lsn) { bool changed = false; + bool revalidated = false; ReplicationSlot *slot = MyReplicationSlot; Assert(XLogRecPtrIsValid(lsn)); @@ -2438,6 +2461,19 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn) changed = true; slot->data.restart_lsn = lsn; } + + /* + * If the slot is invalidated and eligible for auto-revalidation, clear + * the invalidation now that the standby has confirmed WAL receipt. Both + * restart_lsn and invalidated must be updated under the same spinlock to + * stay atomic with respect to ReplicationSlotsComputeRequiredLSN(). + */ + if (SlotCanBeRevalidated(slot)) + { + slot->data.invalidated = RS_INVAL_NONE; + changed = true; + revalidated = true; + } SpinLockRelease(&slot->mutex); if (changed) @@ -2447,6 +2483,20 @@ PhysicalConfirmReceivedLocation(XLogRecPtr lsn) PhysicalWakeupLogicalWalSnd(); } + /* + * Persist the revalidation to disk immediately so the cleared state + * survives a crash. Normal restart_lsn updates are not saved here + * (the comment below explains why), but a revalidation is a significant + * one-time state change worth persisting right away. + */ + if (revalidated) + { + ReplicationSlotSave(); + ereport(LOG, + errmsg("physical replication slot \"%s\" has been revalidated", + NameStr(slot->data.name))); + } + /* * One could argue that the slot should be saved to disk now, but that'd * be energy wasted - the worst thing lost information could cause here is diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h index 4b4709f6e2c..8345ebefe2a 100644 --- a/src/include/replication/slot.h +++ b/src/include/replication/slot.h @@ -159,6 +159,13 @@ typedef struct ReplicationSlotPersistentData * for logical slots on the primary server. */ bool failover; + + /* + * If true, an invalidated physical slot may be automatically revalidated + * once the standby reconnects and confirms WAL receipt (flush ACK). + * Only applicable to physical slots; ignored for logical slots. + */ + bool auto_revalidate; } ReplicationSlotPersistentData; /* @@ -286,6 +293,23 @@ typedef struct ReplicationSlot #define SlotIsPhysical(slot) ((slot)->data.database == InvalidOid) #define SlotIsLogical(slot) ((slot)->data.database != InvalidOid) +#define SlotIsValid(slot) ((slot)->data.invalidated == RS_INVAL_NONE) + +/* + * Can this slot be automatically revalidated? + * + * Only physical slots with auto_revalidate enabled and invalidated by + * an explicitly supported reason are eligible. New invalidation reasons + * must be added here to become revalidatable. + */ +static inline bool +SlotCanBeRevalidated(ReplicationSlot *s) +{ + return SlotIsPhysical(s) && + s->data.auto_revalidate && + (s->data.invalidated == RS_INVAL_WAL_REMOVED || + s->data.invalidated == RS_INVAL_IDLE_TIMEOUT); +} /* * Shared memory control area for all of replication slots. -- 2.50.1 (Apple Git-155)
From 3e0249f217944352fb47a2eccdf45dc95008e814 Mon Sep 17 00:00:00 2001 From: Joao Foltran <[email protected]> Date: Mon, 23 Mar 2026 18:33:03 -0300 Subject: [PATCH v2 3/5] Expose auto_revalidate in pg_replication_slots view Add auto_revalidate boolean column to the pg_replication_slots system view, showing whether a physical replication slot is configured for automatic revalidation after invalidation. --- src/backend/catalog/system_views.sql | 1 + src/backend/replication/slotfuncs.c | 4 +++- src/include/catalog/pg_proc.dat | 6 +++--- src/test/regress/expected/rules.out | 3 ++- 4 files changed, 9 insertions(+), 5 deletions(-) diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index f1ed7b58f13..379435ce46c 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -1096,6 +1096,7 @@ CREATE VIEW pg_replication_slots AS L.invalidation_reason, L.failover, L.synced, + L.auto_revalidate, L.slotsync_skip_reason FROM pg_get_replication_slots() AS L LEFT JOIN pg_database D ON (L.datoid = D.oid); diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index fdde1530a0c..8c44711daf6 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -267,7 +267,7 @@ pg_drop_replication_slot(PG_FUNCTION_ARGS) Datum pg_get_replication_slots(PG_FUNCTION_ARGS) { -#define PG_GET_REPLICATION_SLOTS_COLS 21 +#define PG_GET_REPLICATION_SLOTS_COLS 22 ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; XLogRecPtr currlsn; int slotno; @@ -475,6 +475,8 @@ pg_get_replication_slots(PG_FUNCTION_ARGS) values[i++] = BoolGetDatum(slot_contents.data.synced); + values[i++] = BoolGetDatum(slot_contents.data.auto_revalidate); + if (slot_contents.slotsync_skip_reason == SS_SKIP_NONE) nulls[i++] = true; else diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index de3f99ebb3a..0c708b718be 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -11642,9 +11642,9 @@ proname => 'pg_get_replication_slots', prorows => '10', proisstrict => 'f', proretset => 't', provolatile => 's', prorettype => 'record', proargtypes => '', - proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,text}', - proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', - proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,slotsync_skip_reason}', + proallargtypes => '{name,name,text,oid,bool,bool,int4,xid,xid,pg_lsn,pg_lsn,text,int8,bool,pg_lsn,timestamptz,bool,text,bool,bool,bool,text}', + proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', + proargnames => '{slot_name,plugin,slot_type,datoid,temporary,active,active_pid,xmin,catalog_xmin,restart_lsn,confirmed_flush_lsn,wal_status,safe_wal_size,two_phase,two_phase_at,inactive_since,conflicting,invalidation_reason,failover,synced,auto_revalidate,slotsync_skip_reason}', prosrc => 'pg_get_replication_slots' }, { oid => '3786', descr => 'set up a logical replication slot', proname => 'pg_create_logical_replication_slot', provolatile => 'v', diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 32bea58db2c..6e8f2ab9ca5 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -1510,8 +1510,9 @@ pg_replication_slots| SELECT l.slot_name, l.invalidation_reason, l.failover, l.synced, + l.auto_revalidate, l.slotsync_skip_reason - FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, slotsync_skip_reason) + FROM (pg_get_replication_slots() l(slot_name, plugin, slot_type, datoid, temporary, active, active_pid, xmin, catalog_xmin, restart_lsn, confirmed_flush_lsn, wal_status, safe_wal_size, two_phase, two_phase_at, inactive_since, conflicting, invalidation_reason, failover, synced, auto_revalidate, slotsync_skip_reason) LEFT JOIN pg_database d ON ((l.datoid = d.oid))); pg_roles| SELECT pg_authid.rolname, pg_authid.rolsuper, -- 2.50.1 (Apple Git-155)
From 5eb6a537c60019c1c715813b8417c04bee1e6e13 Mon Sep 17 00:00:00 2001 From: Joao Foltran <[email protected]> Date: Thu, 19 Mar 2026 19:05:09 -0300 Subject: [PATCH v2 2/5] Add auto_revalidate parameter to pg_create_physical_replication_slot Allow setting auto_revalidate when creating a physical replication slot via the SQL function: pg_create_physical_replication_slot(name, immediately_reserve, temporary, auto_revalidate) The flag is set after slot creation (not via ReplicationSlotCreate parameter) to avoid a cascading signature change across all 6 callers of that function. The new parameter defaults to false. --- contrib/test_decoding/expected/slot.out | 32 +++++++++++++++++++++++++ contrib/test_decoding/sql/slot.sql | 7 ++++++ src/backend/replication/slotfuncs.c | 20 +++++++++++++--- src/backend/replication/walsender.c | 1 + src/include/catalog/pg_proc.dat | 11 +++++---- src/test/recovery/meson.build | 1 + 6 files changed, 64 insertions(+), 8 deletions(-) diff --git a/contrib/test_decoding/expected/slot.out b/contrib/test_decoding/expected/slot.out index 7de03c79f6f..1834b42cd92 100644 --- a/contrib/test_decoding/expected/slot.out +++ b/contrib/test_decoding/expected/slot.out @@ -406,6 +406,38 @@ SELECT pg_drop_replication_slot('copied_slot2_notemp'); (1 row) +-- Test auto_revalidate is preserved when copying physical slots +SELECT 'init' FROM pg_create_physical_replication_slot('orig_slot_ar', true, false, true); + ?column? +---------- + init +(1 row) + +SELECT 'copy' FROM pg_copy_physical_replication_slot('orig_slot_ar', 'copied_slot_ar'); + ?column? +---------- + copy +(1 row) + +SELECT slot_name, auto_revalidate FROM pg_replication_slots WHERE slot_name LIKE '%_ar' ORDER BY slot_name; + slot_name | auto_revalidate +----------------+----------------- + copied_slot_ar | t + orig_slot_ar | t +(2 rows) + +SELECT pg_drop_replication_slot('orig_slot_ar'); + pg_drop_replication_slot +-------------------------- + +(1 row) + +SELECT pg_drop_replication_slot('copied_slot_ar'); + pg_drop_replication_slot +-------------------------- + +(1 row) + -- Test failover option of slots. SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'test_decoding', false, false, true); ?column? diff --git a/contrib/test_decoding/sql/slot.sql b/contrib/test_decoding/sql/slot.sql index 580e3ae3bef..98b349d6d60 100644 --- a/contrib/test_decoding/sql/slot.sql +++ b/contrib/test_decoding/sql/slot.sql @@ -177,6 +177,13 @@ SELECT pg_drop_replication_slot('orig_slot2'); SELECT pg_drop_replication_slot('copied_slot2_no_change'); SELECT pg_drop_replication_slot('copied_slot2_notemp'); +-- Test auto_revalidate is preserved when copying physical slots +SELECT 'init' FROM pg_create_physical_replication_slot('orig_slot_ar', true, false, true); +SELECT 'copy' FROM pg_copy_physical_replication_slot('orig_slot_ar', 'copied_slot_ar'); +SELECT slot_name, auto_revalidate FROM pg_replication_slots WHERE slot_name LIKE '%_ar' ORDER BY slot_name; +SELECT pg_drop_replication_slot('orig_slot_ar'); +SELECT pg_drop_replication_slot('copied_slot_ar'); + -- Test failover option of slots. SELECT 'init' FROM pg_create_logical_replication_slot('failover_true_slot', 'test_decoding', false, false, true); SELECT 'init' FROM pg_create_logical_replication_slot('failover_false_slot', 'test_decoding', false, false, false); diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c index 9f5e4f998fe..fdde1530a0c 100644 --- a/src/backend/replication/slotfuncs.c +++ b/src/backend/replication/slotfuncs.c @@ -46,7 +46,8 @@ static const char *SlotSyncSkipReasonNames[] = { */ static void create_physical_replication_slot(char *name, bool immediately_reserve, - bool temporary, XLogRecPtr restart_lsn) + bool temporary, XLogRecPtr restart_lsn, + bool auto_revalidate) { Assert(!MyReplicationSlot); @@ -67,6 +68,16 @@ create_physical_replication_slot(char *name, bool immediately_reserve, ReplicationSlotMarkDirty(); ReplicationSlotSave(); } + + if (auto_revalidate) + { + SpinLockAcquire(&MyReplicationSlot->mutex); + MyReplicationSlot->data.auto_revalidate = true; + SpinLockRelease(&MyReplicationSlot->mutex); + + ReplicationSlotMarkDirty(); + ReplicationSlotSave(); + } } /* @@ -79,6 +90,7 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS) Name name = PG_GETARG_NAME(0); bool immediately_reserve = PG_GETARG_BOOL(1); bool temporary = PG_GETARG_BOOL(2); + bool auto_revalidate = PG_GETARG_BOOL(3); Datum values[2]; bool nulls[2]; TupleDesc tupdesc; @@ -95,7 +107,8 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS) create_physical_replication_slot(NameStr(*name), immediately_reserve, temporary, - InvalidXLogRecPtr); + InvalidXLogRecPtr, + auto_revalidate); values[0] = NameGetDatum(&MyReplicationSlot->data.name); nulls[0] = false; @@ -757,7 +770,8 @@ copy_replication_slot(FunctionCallInfo fcinfo, bool logical_slot) create_physical_replication_slot(NameStr(*dst_name), true, temporary, - src_restart_lsn); + src_restart_lsn, + first_slot_contents.data.auto_revalidate); /* * Update the destination slot to current values of the source slot; diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 50ada2ed293..0d0c4f8d112 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1254,6 +1254,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd) if (!cmd->temporary) ReplicationSlotSave(); } + } else { diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index fc8d82665b8..de3f99ebb3a 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -11612,11 +11612,12 @@ # replication slots { oid => '3779', descr => 'create a physical replication slot', proname => 'pg_create_physical_replication_slot', provolatile => 'v', - proparallel => 'u', prorettype => 'record', proargtypes => 'name bool bool', - proallargtypes => '{name,bool,bool,name,pg_lsn}', - proargmodes => '{i,i,i,o,o}', - proargnames => '{slot_name,immediately_reserve,temporary,slot_name,lsn}', - proargdefaults => '{false,false}', + proparallel => 'u', prorettype => 'record', + proargtypes => 'name bool bool bool', + proallargtypes => '{name,bool,bool,bool,name,pg_lsn}', + proargmodes => '{i,i,i,i,o,o}', + proargnames => '{slot_name,immediately_reserve,temporary,auto_revalidate,slot_name,lsn}', + proargdefaults => '{false,false,false}', prosrc => 'pg_create_physical_replication_slot' }, { oid => '4220', descr => 'copy a physical replication slot, changing temporality', diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build index 36d789720a3..e1bd2d5d8f5 100644 --- a/src/test/recovery/meson.build +++ b/src/test/recovery/meson.build @@ -61,6 +61,7 @@ tests += { 't/050_redo_segment_missing.pl', 't/051_effective_wal_level.pl', 't/052_checkpoint_segment_missing.pl', + 't/053_auto_revalidate_physical_slot.pl', ], }, } -- 2.50.1 (Apple Git-155)
From 96e51acf88bb3ef721becb80cad0e54ce654e06d Mon Sep 17 00:00:00 2001 From: Joao Foltran <[email protected]> Date: Mon, 23 Mar 2026 18:33:22 -0300 Subject: [PATCH v2 4/5] Add TAP test for physical replication slot auto-revalidation Test the end-to-end auto_revalidate workflow for physical replication slots: 1. Slot with auto_revalidate=true is revalidated after the standby reconnects via archive recovery and confirms WAL receipt. 2. After revalidation, the slot properly holds back WAL (wal_status = 'reserved') when the standby is stopped and WAL advances with max_slot_wal_keep_size = -1. 3. With hot_standby_feedback=on, xmin is re-established after revalidation and advances past the stale pre-invalidation value. 4. Slot with auto_revalidate=false preserves the existing ERROR behavior on invalidated slot acquisition. 5. Revalidation after idle_timeout invalidation (requires injection_points; skipped if not available). --- .../t/053_auto_revalidate_physical_slot.pl | 392 ++++++++++++++++++ 1 file changed, 392 insertions(+) create mode 100644 src/test/recovery/t/053_auto_revalidate_physical_slot.pl diff --git a/src/test/recovery/t/053_auto_revalidate_physical_slot.pl b/src/test/recovery/t/053_auto_revalidate_physical_slot.pl new file mode 100644 index 00000000000..f19634f3a7a --- /dev/null +++ b/src/test/recovery/t/053_auto_revalidate_physical_slot.pl @@ -0,0 +1,392 @@ + +# Copyright (c) 2021-2026, PostgreSQL Global Development Group + +# Test auto-revalidation of physical replication slots. +# +# Verifies that a physical replication slot with auto_revalidate=true can +# recover from invalidation when the standby reconnects and confirms WAL +# receipt. Also verifies that after revalidation the slot properly holds +# back WAL again, and that hot_standby_feedback xmin is re-established. +use strict; +use warnings FATAL => 'all'; + +use PostgreSQL::Test::Utils; +use PostgreSQL::Test::Cluster; +use Test::More; + +# +# Primary setup: wal-segsize=1MB, archiving enabled, small +# max_slot_wal_keep_size so we can trigger invalidation quickly. +# +my $node_primary = PostgreSQL::Test::Cluster->new('primary'); +$node_primary->init( + allows_streaming => 1, + has_archiving => 1, + extra => ['--wal-segsize=1']); +$node_primary->append_conf( + 'postgresql.conf', qq( +min_wal_size = 2MB +max_wal_size = 4MB +max_slot_wal_keep_size = 1MB +log_checkpoints = yes +)); +$node_primary->start; + +# Create a physical replication slot with auto_revalidate enabled. +$node_primary->safe_psql('postgres', + "SELECT pg_create_physical_replication_slot('revalidate_slot', true, false, true)" +); + +# Verify auto_revalidate is true +my $result = $node_primary->safe_psql('postgres', + "SELECT auto_revalidate FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'"); +is($result, "t", 'auto_revalidate is true on created slot'); + +# Take a backup for the standby +my $backup_name = 'my_backup'; +$node_primary->backup($backup_name); + +# Create standby using the slot, with both streaming and restore_command +my $node_standby = PostgreSQL::Test::Cluster->new('standby'); +$node_standby->init_from_backup( + $node_primary, $backup_name, + has_streaming => 1, + has_restoring => 1); +$node_standby->append_conf('postgresql.conf', + "primary_slot_name = 'revalidate_slot'"); +$node_standby->start; + +# Wait for standby to catch up +$node_primary->wait_for_catchup($node_standby); + +# Verify slot is active and not invalidated +$result = $node_primary->safe_psql('postgres', + "SELECT active, invalidation_reason IS NULL FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'"); +is($result, "t|t", 'slot is active and valid after initial sync'); + +# Stop standby so we can invalidate the slot +$node_standby->stop; + +# Generate enough WAL to exceed max_slot_wal_keep_size and invalidate the slot +my $logstart = -s $node_primary->logfile; +$node_primary->advance_wal(7); +$node_primary->safe_psql('postgres', "CHECKPOINT;"); + +# Wait for slot to be invalidated +$logstart = $node_primary->wait_for_log( + qr/invalidating obsolete replication slot "revalidate_slot"/, + $logstart); +pass('slot was invalidated due to WAL removal'); + +$result = $node_primary->safe_psql('postgres', + "SELECT invalidation_reason FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'"); +is($result, "wal_removed", 'slot shows wal_removed invalidation reason'); + +########################################################################## +# Test 1: Revalidation after standby reconnects +########################################################################## + +# Restart standby -- it should recover from archive then start streaming. +# The slot should be revalidated automatically. +$logstart = -s $node_primary->logfile; +$node_standby->start; + +# Wait for the revalidation log message +$logstart = $node_primary->wait_for_log( + qr/physical replication slot "revalidate_slot" has been revalidated/, + $logstart); +pass('slot was revalidated after standby reconnected'); + +# Wait for standby to fully catch up +$node_primary->wait_for_catchup($node_standby); + +# Verify slot is now valid again +$result = $node_primary->safe_psql('postgres', + "SELECT active, invalidation_reason IS NULL FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'"); +is($result, "t|t", 'slot is active and valid after revalidation'); + +########################################################################## +# Test 2: After revalidation, slot holds back WAL when standby is stopped +########################################################################## + +# Remove the WAL limit so we can test WAL retention by the slot itself +$node_primary->safe_psql('postgres', + "ALTER SYSTEM SET max_slot_wal_keep_size = '-1'"); +$node_primary->reload; + +# Stop standby +$node_standby->stop; + +# Record the current restart_lsn +my $restart_lsn_before = $node_primary->safe_psql('postgres', + "SELECT restart_lsn FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'"); +ok($restart_lsn_before ne '', 'restart_lsn is set after revalidation'); + +# Advance WAL significantly +$node_primary->advance_wal(5); +$node_primary->safe_psql('postgres', "CHECKPOINT;"); + +# Verify the slot's WAL is still reserved (not lost) +$result = $node_primary->safe_psql('postgres', + "SELECT wal_status FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'"); +is($result, "reserved", + 'slot WAL status is "reserved" after revalidation -- WAL is held back'); + +# Restart standby and verify it catches up without issues +$node_standby->start; +$node_primary->wait_for_catchup($node_standby); + +$result = $node_primary->safe_psql('postgres', + "SELECT active, invalidation_reason IS NULL FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'"); +is($result, "t|t", 'slot still valid after standby reconnected with held WAL'); + +$node_standby->stop; + +########################################################################## +# Test 3: hot_standby_feedback=on -- xmin is re-established after revalidation +########################################################################## + +# Enable hot_standby_feedback on the standby +$node_standby->append_conf('postgresql.conf', "hot_standby_feedback = on"); + +# Re-enable WAL limit for another invalidation cycle +$node_primary->safe_psql('postgres', + "ALTER SYSTEM SET max_slot_wal_keep_size = '1MB'"); +$node_primary->reload; + +$node_standby->start; +$node_primary->wait_for_catchup($node_standby); + +# Wait for xmin to be populated via hot_standby_feedback +$node_primary->poll_query_until('postgres', + "SELECT xmin IS NOT NULL FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'") + or die "Timed out waiting for xmin to be set via hot_standby_feedback"; + +my $xmin_before = $node_primary->safe_psql('postgres', + "SELECT xmin FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'"); +note "xmin before invalidation: $xmin_before"; +ok($xmin_before ne '', 'xmin is set via hot_standby_feedback before invalidation'); + +# Stop standby and invalidate the slot again +$node_standby->stop; + +$logstart = -s $node_primary->logfile; +$node_primary->advance_wal(7); +$node_primary->safe_psql('postgres', "CHECKPOINT;"); + +$logstart = $node_primary->wait_for_log( + qr/invalidating obsolete replication slot "revalidate_slot"/, + $logstart); +pass('slot invalidated again for hot_standby_feedback test'); + +# While invalidated, the xmin should NOT be counted by the system. +# Run some transactions so the primary's xid advances well past the stale xmin. +$node_primary->safe_psql('postgres', + "CREATE TABLE hsf_test(id int); DROP TABLE hsf_test;"); + +# Restart standby, expect revalidation +$logstart = -s $node_primary->logfile; +$node_standby->start; + +$logstart = $node_primary->wait_for_log( + qr/physical replication slot "revalidate_slot" has been revalidated/, + $logstart); +pass('slot revalidated again with hot_standby_feedback=on'); + +$node_primary->wait_for_catchup($node_standby); + +# Wait for xmin to be re-populated via hot_standby_feedback +$node_primary->poll_query_until('postgres', + "SELECT xmin IS NOT NULL FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'") + or die "Timed out waiting for xmin to be re-established after revalidation"; + +my $xmin_after = $node_primary->safe_psql('postgres', + "SELECT xmin FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'"); +note "xmin after revalidation: $xmin_after (was $xmin_before)"; +ok($xmin_after ne '', 'xmin is re-established after revalidation with hot_standby_feedback=on'); + +# The new xmin should have advanced past the pre-invalidation value, +# since the standby caught up with all the WAL generated while it was down. +my $xmin_advanced = $node_primary->safe_psql('postgres', + "SELECT '$xmin_after'::xid8 >= '$xmin_before'::xid8"); +is($xmin_advanced, "t", + 'xmin advanced after revalidation (not stuck at stale pre-invalidation value)'); + +# Verify slot is valid +$result = $node_primary->safe_psql('postgres', + "SELECT active, invalidation_reason IS NULL FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'"); +is($result, "t|t", 'slot is valid after revalidation with hot_standby_feedback'); + +$node_standby->stop; + +########################################################################## +# Test 4: auto_revalidate=false still errors (existing behavior preserved) +########################################################################## + +# Create a second slot without auto_revalidate +$node_primary->safe_psql('postgres', + "SELECT pg_create_physical_replication_slot('no_revalidate_slot', true, false, false)" +); + +$result = $node_primary->safe_psql('postgres', + "SELECT auto_revalidate FROM pg_replication_slots WHERE slot_name = 'no_revalidate_slot'"); +is($result, "f", 'auto_revalidate is false on second slot'); + +# Create second standby using this slot +my $node_standby2 = PostgreSQL::Test::Cluster->new('standby_2'); +$node_standby2->init_from_backup( + $node_primary, $backup_name, + has_streaming => 1, + has_restoring => 1); +$node_standby2->append_conf('postgresql.conf', + "primary_slot_name = 'no_revalidate_slot'"); +$node_standby2->start; +$node_primary->wait_for_catchup($node_standby2); +$node_standby2->stop; + +# Invalidate the second slot +$logstart = -s $node_primary->logfile; +$node_primary->advance_wal(7); +$node_primary->safe_psql('postgres', "CHECKPOINT;"); + +$logstart = $node_primary->wait_for_log( + qr/invalidating obsolete replication slot "no_revalidate_slot"/, + $logstart); +pass('second slot (auto_revalidate=false) was invalidated'); + +# Start standby2 -- it should fail to connect due to the invalidated slot +$logstart = -s $node_standby2->logfile; +$node_standby2->start; + +# Wait for the FATAL error in standby log +$node_standby2->wait_for_log( + qr/can no longer access replication slot "no_revalidate_slot"/, + $logstart); +pass('standby with auto_revalidate=false gets error on invalidated slot'); + +# Cleanup standby2 from Test 4 +$node_standby2->stop('immediate'); + +########################################################################## +# Test 5: Copied slot with auto_revalidate=true still revalidates +########################################################################## + +# Copy the original slot -- auto_revalidate should be preserved +$node_primary->safe_psql('postgres', + "SELECT pg_copy_physical_replication_slot('revalidate_slot', 'copied_slot')"); + +$result = $node_primary->safe_psql('postgres', + "SELECT auto_revalidate FROM pg_replication_slots WHERE slot_name = 'copied_slot'"); +is($result, "t", 'copied slot preserved auto_revalidate=true'); + +# Create a standby using the copied slot +my $node_standby3 = PostgreSQL::Test::Cluster->new('standby_3'); +$node_standby3->init_from_backup( + $node_primary, $backup_name, + has_streaming => 1, + has_restoring => 1); +$node_standby3->append_conf('postgresql.conf', + "primary_slot_name = 'copied_slot'"); +$node_standby3->start; +$node_primary->wait_for_catchup($node_standby3); +$node_standby3->stop; + +# Invalidate the copied slot +$logstart = -s $node_primary->logfile; +$node_primary->advance_wal(7); +$node_primary->safe_psql('postgres', "CHECKPOINT;"); + +$logstart = $node_primary->wait_for_log( + qr/invalidating obsolete replication slot "copied_slot"/, + $logstart); +pass('copied slot was invalidated'); + +# Reconnect -- the copied slot should auto-revalidate +$logstart = -s $node_primary->logfile; +$node_standby3->start; + +$node_primary->wait_for_log( + qr/physical replication slot "copied_slot" has been revalidated/, + $logstart); +pass('copied slot was revalidated after standby reconnected'); + +$node_primary->wait_for_catchup($node_standby3); + +$result = $node_primary->safe_psql('postgres', + "SELECT active, invalidation_reason IS NULL FROM pg_replication_slots WHERE slot_name = 'copied_slot'"); +is($result, "t|t", 'copied slot is active and valid after revalidation'); + +$node_standby3->stop('immediate'); +$node_primary->safe_psql('postgres', + "SELECT pg_drop_replication_slot('copied_slot')"); + +########################################################################## +# Test 6: Revalidation after idle_timeout invalidation +# +# Requires injection_points to force idle_timeout without waiting real +# wall-clock time. Skipped if the build does not support injection points. +########################################################################## + +SKIP: +{ + skip "injection points not supported by this build", 3 + unless ($ENV{enable_injection_points} eq 'yes'); + + skip "injection_points extension not installed", 3 + unless ($node_primary->check_extension('injection_points')); + + $node_primary->safe_psql('postgres', 'CREATE EXTENSION IF NOT EXISTS injection_points;'); + + # Remove WAL size limit so idle_timeout is the only invalidation vector, + # and enable idle_replication_slot_timeout (required by CanInvalidateIdleSlot). + $node_primary->safe_psql('postgres', + "ALTER SYSTEM SET max_slot_wal_keep_size = '-1'"); + $node_primary->safe_psql('postgres', + "ALTER SYSTEM SET idle_replication_slot_timeout = '1min'"); + $node_primary->reload; + + # Revalidate_slot should still exist and be valid from earlier tests. + # Start standby, catch up, then stop it so the slot becomes idle. + $node_standby->start; + $node_primary->wait_for_catchup($node_standby); + $node_standby->stop; + + # Attach the injection point that forces idle_timeout invalidation + $node_primary->safe_psql('postgres', + "SELECT injection_points_attach('slot-timeout-inval', 'error');"); + + # Checkpoint triggers the invalidation check + $logstart = -s $node_primary->logfile; + $node_primary->safe_psql('postgres', "CHECKPOINT;"); + + $logstart = $node_primary->wait_for_log( + qr/invalidating obsolete replication slot "revalidate_slot"/, + $logstart); + pass('slot invalidated due to idle_timeout (injection point)'); + + # Confirm the reason is idle_timeout + $result = $node_primary->safe_psql('postgres', + "SELECT invalidation_reason FROM pg_replication_slots WHERE slot_name = 'revalidate_slot'"); + is($result, "idle_timeout", 'invalidation reason is idle_timeout'); + + # Detach the injection point before reconnecting so it does not + # interfere with subsequent checkpoint cycles. + $node_primary->safe_psql('postgres', + "SELECT injection_points_detach('slot-timeout-inval');"); + + # Reconnect standby -- slot should auto-revalidate + $logstart = -s $node_primary->logfile; + $node_standby->start; + + $node_primary->wait_for_log( + qr/physical replication slot "revalidate_slot" has been revalidated/, + $logstart); + pass('slot revalidated after idle_timeout invalidation'); + + $node_standby->stop; +} + +# Cleanup +$node_primary->stop; + +done_testing(); -- 2.50.1 (Apple Git-155)
