Hi, An issue occurred during the initial switchover using PostgreSQL version 17.5. The setup consists of a cluster with two nodes, managed by Patroni version 4.0.5. Logical replication is configured on the same instance, and the new feature enabling logical replication slots to be failover-safe in a highly available environment is used. Logical slot management is currently disabled in Patroni.
Following are some screen captured during the swichover 1. Run the switchover with Patroni patronictl switchover Current cluster topology + Cluster: ClusterX (7529893278186104053) ----+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +----------+--------------+---------+-----------+----+-----------+ | node_1 | xxxxxxxxxxxx | Leader | running | 4 | | | node_2 | xxxxxxxxxxxx | Replica | streaming | 4 | 0 | +----------+--------------+---------+-----------+----+-----------+ 2. Check the slot on the new Primary select * from pg_replication_slots where slot_type = 'logical'; +-[ RECORD 1 ]--------+----------------+ | slot_name | logical_slot | | plugin | pgoutput | | slot_type | logical | | datoid | 25605 | | database | db_test | | temporary | f | | active | t | | active_pid | 3841546 | | xmin | | | catalog_xmin | 10399 | | restart_lsn | 0/37002410 | | confirmed_flush_lsn | 0/37002448 | | wal_status | reserved | | safe_wal_size | | | two_phase | f | | inactive_since | | | conflicting | f | | invalidation_reason | | | failover | t | | synced | t | +---------------------+----------------+ Logical replication is active again after the promote 3. Check the slot on the new standby select * from pg_replication_slots where slot_type = 'logical'; +-[ RECORD 1 ]--------+-------------------------------+ | slot_name | logical_slot | | plugin | pgoutput | | slot_type | logical | | datoid | 25605 | | database | db_test | | temporary | f | | active | f | | active_pid | | | xmin | | | catalog_xmin | 10397 | | restart_lsn | 0/3638F5F0 | | confirmed_flush_lsn | 0/3638F6A0 | | wal_status | reserved | | safe_wal_size | | | two_phase | f | | inactive_since | 2025-08-05 10:21:03.342587+02 | | conflicting | f | | invalidation_reason | | | failover | t | | synced | f | +---------------------+--------------------------- The synced flag keep value false. Following error in in the log 2025-06-10 16:40:58.996 CEST [739829]: [1-1] user=,db=,client=,application= LOG: slot sync worker started 2025-06-10 16:40:59.011 CEST [739829]: [2-1] user=,db=,client=,application= ERROR: exiting from slot synchronization because same name slot "logical_slot" already exists on the standby I would like to make a proposal to address the issue: Since the logical slot is in a failover state on both the primary and the standby, an attempt could be made to resynchronize them. I modify the slotsync.c module +++ b/src/backend/replication/logical/slotsync.c @@ -649,24 +649,46 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) return false; } - - /* Search for the named slot */ + // Both local and remote slot have the same name if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) { bool synced; + bool failover_status = remote_slot->failover; SpinLockAcquire(&slot->mutex); synced = slot->data.synced; SpinLockRelease(&slot->mutex); + + if (!synced){ + + Assert(!MyReplicationSlot); + + if (failover_status){ + + ReplicationSlotAcquire(remote_slot->name, true, true); + + // Put the synced flag to true to attempt resynchronizing failover slot on the standby + MyReplicationSlot->data.synced = true; + + ReplicationSlotMarkDirty(); - /* User-created slot with the same name exists, raise ERROR. */ - if (!synced) - ereport(ERROR, + ReplicationSlotRelease(); + + /* Get rid of a replication slot that is no longer wanted */ + ereport(WARNING, + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("slot \"%s\" local slot has the same name as remote slot and they are in failover mode, try to synchronize them", + remote_slot->name)); + return false; /* Going back to the main loop after droping the failover slot */ + } + else + /* User-created slot with the same name exists, raise ERROR. */ + ereport(ERROR, errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("exiting from slot synchronization because same" - " name slot \"%s\" already exists on the standby", - remote_slot->name)); - + " name slot \"%s\" already exists on the standby", + remote_slot->name)); + } /* * The slot has been synchronized before. * This message follows the discussions started in this thread: https://www.postgresql.org/message-id/CAA5-nLDvnqGtBsKu4T_s-cS%2BdGbpSLEzRwgep1XfYzGhQ4o65A%40mail.gmail.com Help would be appreciated to move this point forward Best regards, Fabrice