On Tue, May 12, 2026 at 4:38 PM Antonin Houska <[email protected]> wrote: > > Amit Kapila <[email protected]> wrote: > > > > > I see your point. Due to this, once the xmin regresses based on > > cluster-wide running_xact, some transaction could appear to be running > > when it should have appeared as committed. > > The problem is that xmin does not advance when it should. Attached is a test > that reproduces the problem (it includes [1], to handle injection points in > background worker), I hope the comments in the specification file are helpful. >
Yes, the comments were helpful. IIUC, the test skipped insert into repack_test because the transaction doing that insert happened before the snapbuild reached FULL_SNAPSHOT/CONSISTENT state, so its commit is not decoded. Then we also didn't update builder->xmin after reaching CONSISTENT state in the last running_xact record for MyDatabase. So, the insert is neither covered in initial copy, nor decoded, so after repack (concurrently) is finished the table is empty. I think the patch proposed will fix this specific issue but apart from other points raised for this patch few more are: (a) Post CONSISTENT state, for cleanup of db_specific snapshots, we need to separately again LOG db-specific running xacts whenever we encounter another running_xacts, (b) Other point is that because repack-concurrently always use full-snapshot, the serialization of snapshot is useless because it can't be used by others. > It's actually not exactly the problem reported in the stress test, but IMO the > core issue is the same: effects of some transactions are lost. In the stress > test, tuple deletion was lost, so the error was "could not create unique > index". Here I only demonstrate lost INSERT. > > > Assuming, the problematic case is something > > like what I described, even than the fix of skipping cluster-wide > > running xacts and instead LOG db-specific running_xacts to help > > updating builder's xmin sounds inelegant and probably inefficient. For > > example, I think such a dependency means we can never enable > > db-specific snapshots on standby. > > ok > So now the question is where do we go from here. I am not confident that the current code to achieve db-specific snapshots in logical decoding is the best possible solution both because of the drawbacks (like we won't be able to enable this on standby) and inefficiencies pointed out by me in this and previous emails in this work. -- With Regards, Amit Kapila.
