On Fri, May 30, 2025 at 3:00 PM Nisha Moond <nisha.moond...@gmail.com> wrote: > > Agree that we need to cover the simple pg_dump and pg_restore with the patch. > > When pg_dump and pg_restore are used outside of pg_upgrade, there's no > guarantee that the target system does not have any prepared > transactions. In such cases, restoring a subscription with both > two_phase and failover enabled could lead to the bug, so we should > avoid allowing both options via pg_restore. > > Here are a few possible solutions: > > 1) Split the CREATE SUBSCRIPTION command of such subscriptions in pg_restore : > first create the sub with two_phase: > - CREATE SUBSCRIPTION ... with (two_phase = on) > then enable failover > - ALTER SUBSCRIPTION ... with (failover=true) >
This won't work because we always dump a subscription with the 'connect' option as 'false'. As changing the subscription's failover option, needs the connection above split won't work. > However, this approach adds complexity in pg_restore and may still > fail if the slot's restart_lsn is earlier than two_phase_at. > > 2) Prevent dump of subscription in non-upgrade mode: > Raise an error/warning during pg_dump when a subscription with both > two_phase and failover is encountered (unless in binary upgrade mode). > That is, either fail the pg_dump, or skip dumping subscriptions in > such a case with a warning. > We can include a helpful hint: > "Disable failover for subscription <sub-name> before dumping and > re-enable it after restore." > > 3) Dump such subscriptions with (two_phase=on, failover=on, > create_slot=false) together. As pg_dump always sets "connect=false" > for subscription, it is up to users to reactivate the subscription > suitably. In this case, we can add some document to warn that it's the > user's responsibility to ensure the remote slot is in a safe state. > > What do you think? > > Please let me know if you have any other suggestions for handling this > in pg_dump/pg_restore. > Yet another idea is to dump the subscription with two_phase = on and failover = false. We should do this when both options are 'true' during the dump. As we are documenting that we always dump createsubscription with connect as false and let users take care (see [1] (When dumping logical replication subscriptions ...)), a similar reasoning could be given for the failover flag. The one more combination to consider is when someone takes a dump of an older version and loads it into a newer version. For example, where users dump from 17.5 and then restore in a newer version, say 17.6 (which has our fix), the restore will fail due to newer restrictions added by this patch. Do we need to do anything about it? [1] : https://www.postgresql.org/docs/devel/app-pgdump.html -- With Regards, Amit Kapila.