On Thu, Feb 24, 2022 at 12:46 AM James Coleman <jtc...@gmail.com> wrote: > I've been working on adding test coverage to prove this out, but I've > encountered the problem reported in [1]. > > My assumption, but Andres please correct me if I'm wrong, that we > should see issues with the following steps (given the primary, > physical replica, and logical subscriber already created in the test): > > 1. Ensure both logical subscriber and physical replica are caught up > 2. Disable logical subscription > 3. Make a catalog change on the primary (currently renaming the > primary key column) > 4. Vacuum pg_class > 5. Ensure physical replication is caught up > 6. Stop primary and promote the replica > 7. Write to the changed table > 8. Update subscription to point to promoted replica > 9. Re-enable logical subscription > > I'm attaching my test as an additional patch in the series for > reference. Currently I have steps 3 and 4 commented out to show that > the issues in [1] occur without any attempt to trigger the catalog > xmin problem. > > Given this error seems pretty significant in terms of indicating > fundamental lack of test coverage (the primary stated benefit of the > patch is physical failover), and it currently is a blocker to testing > more deeply.
Few of my initial concerns specified at [1] are this: 1) Instead of a new LIST_SLOT command, can't we use READ_REPLICATION_SLOT (slight modifications needs to be done to make it support logical replication slots and to get more information from the subscriber). 2) How frequently the new bg worker is going to sync the slot info? How can it ensure that the latest information exists say when the subscriber is down/crashed before it picks up the latest slot information? 4) IIUC, the proposal works only for logical replication slots but do you also see the need for supporting some kind of synchronization of physical replication slots as well? IMO, we need a better and consistent way for both types of replication slots. If the walsender can somehow push the slot info from the primary (for physical replication slots)/publisher (for logical replication slots) to the standby/subscribers, this will be a more consistent and simplistic design. However, I'm not sure if this design is doable at all. Can anyone help clarify these? [1] https://www.postgresql.org/message-id/CALj2ACUGNGfWRtwwZwT-Y6feEP8EtOMhVTE87rdeY14mBpsRUA%40mail.gmail.com Regards, Bharath Rupireddy.