Hi all, I don't think we really need to have allow_overwrite. It is not possible to create logical slots on standby with failover=true, therefore we can safely rely on failover being true to understand that at some point this node was a primary and that this slot is supposed to be synced. Please see the patch attached.
Regards, -- Alexander Kukushkin
From 469324c263c5ee09ae0f6d75761e2607e08a9e67 Mon Sep 17 00:00:00 2001 From: Alexander Kukushkin <[email protected]> Date: Tue, 28 Oct 2025 13:32:43 +0100 Subject: [PATCH] Continue slots synchronization after switchover When the former primary is started up after clean shutdown as a standby it was refusing to synchronize logical failover slots because synced is false with the error: "exiting from slot synchronization because same name slot "<name>" already exists on the standby". Since we don't allow creation of logical replication slots with failover=true on standby it is safe to check that replication slot has failover=true and in this case set synced=true and continue. Besides that, we change filter in drop_local_obsolete_slots() function, because 'failover' better defines purpose of the slot. --- src/backend/replication/logical/slotsync.c | 31 ++++++++++++++++------ 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c index b122d99b009..55d6f2fb9e9 100644 --- a/src/backend/replication/logical/slotsync.c +++ b/src/backend/replication/logical/slotsync.c @@ -455,10 +455,11 @@ drop_local_obsolete_slots(List *remote_slot_list) * slot by the user. This new user-created slot may end up using * the same shared memory as that of 'local_slot'. Thus check if * local_slot is still the synced one before performing actual - * drop. + * drop. Yes, we actually check 'failover', not 'synced', because + * it could have been created on primary which is now a standby. */ SpinLockAcquire(&local_slot->mutex); - synced_slot = local_slot->in_use && local_slot->data.synced; + synced_slot = local_slot->in_use && local_slot->data.failover; SpinLockRelease(&local_slot->mutex); if (synced_slot) @@ -653,18 +654,32 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) { bool synced; + bool failover; SpinLockAcquire(&slot->mutex); synced = slot->data.synced; + failover = slot->data.failover; SpinLockRelease(&slot->mutex); - /* User-created slot with the same name exists, raise ERROR. */ if (!synced) - ereport(ERROR, - errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), - errmsg("exiting from slot synchronization because same" - " name slot \"%s\" already exists on the standby", - remote_slot->name)); + { + /* User-created slot with the same name exists, raise ERROR. */ + if (!failover) + ereport(ERROR, + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("exiting from slot synchronization because same" + " name slot \"%s\" already exists on the standby", + remote_slot->name)); + + /* + * At some point we were a primary, and it was expected to have + * synced = false and failover = true. In this case we want to set + * synced = true and continue synchronization. + */ + SpinLockAcquire(&slot->mutex); + slot->data.synced = true; + SpinLockRelease(&slot->mutex); + } /* * The slot has been synchronized before. -- 2.34.1
