On Fri, Nov 14, 2025 at 11:40 AM Masahiko Sawada <[email protected]> wrote: > > On Thu, Nov 13, 2025 at 7:16 PM shveta malik <[email protected]> wrote: > > > > On Thu, Nov 13, 2025 at 6:39 PM Alexander Kukushkin <[email protected]> > > wrote: > > > > > > > > > > > >> But the system can die/crash before shutdown. > > > > > > > > > You mean it will not write WAL? > > > When a logical replication slot is created we build a snapshot and also > > > write to WAL: > > > postgres=# select pg_current_wal_insert_lsn(); select > > > pg_create_logical_replication_slot('foo', 'pgoutput'); select > > > pg_current_wal_insert_lsn(); > > > pg_current_wal_insert_lsn > > > --------------------------- > > > 0/37F96F8 > > > (1 row) > > > > > > pg_create_logical_replication_slot > > > ------------------------------------ > > > (foo,0/37F9730) > > > (1 row) > > > > > > pg_current_wal_insert_lsn > > > --------------------------- > > > 0/37F9730 > > > (1 row) > > > > > > Only after that slot is marked as persistent. > > > > > > > There can be a scenario where a replication slot is dropped and > > recreated, and its WAL is also replicated to the standby. However, > > before the new slot state can be synchronized via slotsync, the > > primary crashes and the standby is promoted. Later, the user manually > > reconfigures the old primary to follow the newly promoted standby (no > > pg-rewind in play). I was wondering whether in such a case, would it > > be a good idea to overwrite the newly created slot on old primary with > > promoted-standby's synced slot (old one) by default? Thoughts? > > I think it's an extremely rare or a mostly wrong operation that after > failover (i.e., the old primary didn't shutdown gracefully) users have > the old primar rejoin to the replication as the new standby without > pg_rewind. I guess that pg_rewind should practically be used unless > the primary server gracefully shutdowns (i.e., in switchover case). In > failover cases, pg_rewind launches the server in single-user mode to > run the crash recovery, advancing its LSN and cleaning all existing > replication slots after rewinding the server. So I think that the > reported issue doesn't happen in failover cases and we can focus on > failover cases. >
The point is quite fundamental, do you think we can sync to a pre-existing slot with the same name and failover marked as true after the first time the node joins a new primary? We don't provide any switchover tools/utilities, so it doesn't appear straight-forward that we can perform re-sync. If we would have a switchover tool, I think one may have removed all existing slots before the old primary joins the new primary because otherwise, there is always a chance that there remain redundant slots which will prevent resource removal. Consider a case where after switchover, the old primary decides to join a different standby (new primary) than where slot-sync was earlier happening. Now, it is possible that the old primary may have some slots which should be removed. -- With Regards, Amit Kapila.
