Hi, On Fri, Jan 24, 2025 at 02:44:21PM -0500, Andres Freund wrote: > Hm, maybe I'm missing something, but isn't it possible for the active slot to > actually progress decoding past the conflict point? It's an active slot, with > the consumer running in the background, so all that needs to happen for that > is that logical decoding progresses past the conflict point. That requires > there be some reference to a newer xid to be in the WAL, but there's nothing > preventing that afaict? > > > In fact, I now saw this comment: > > # Note that pg_current_snapshot() is used to get the horizon. It does > # not generate a Transaction/COMMIT WAL record, decreasing the risk of > # seeing a xl_running_xacts that would advance an active replication slot's > # catalog_xmin. Advancing the active replication slot's catalog_xmin > # would break some tests that expect the active slot to conflict with > # the catalog xmin horizon.
Yeah, that comes from 46d8587b504 (where we tried to reduce as much as possible the risk of seeing an unwanted xl_running_xacts being generated). > Which seems precisely what's happening here? Much probably yes. > If that's the issue, I think we need to find a way to block logical decoding > from making forward progress during the test. > > The easiest way would be to stop pg_recvlogical and emit a bunch of changes, > so that the backend is stalled sending out data. But that'd require a hard to > predict amount of data to be emitted, which isn't great. What about using an injection point instead to block pg_recvlogical until we want it to resume? > But perhaps we could do something smarter, by starting a session on the > primary that acquires an access exclusive lock on a relation that logical > decoding will need to access? The tricky bit likely would be that it'd > somehow need to *not* prevent VACUUM on the primary. Hm, I'm not sure how we could do that. > If we could trigger VACUUM in a transaction on the primary this would be > easy, but we can't. Another idea that I had ([1]) was to make use of injection points around places where RUNNING_XACTS is emitted. IIRC I tried to work on this but that was not simple as it sounds as we need the startup process not to be blocked . [1]: https://www.postgresql.org/message-id/ZmadPZlEecJNbhvI%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com