On 2020-08-18 13:28:05 -0700, Andres Freund wrote: > Hi, > > On 2020-08-18 01:21:17 -0400, Tom Lane wrote: > > Andres Freund <[email protected]> writes: > > > I'd written to Tom that I was planning to revert unless the number of > > > failures were lower than initially indicated. But that actually seems to > > > have come to pass (the failures are quicker to report because they don't > > > run the subsequent tests, of course). I'd like to let the failures > > > accumulate a bit longer, say until tomorrow Midday if I haven't figured > > > it out by then. With the hope of finding some detail to help pinpoint > > > the issue. > > > > There's certainly no obvious pattern here, so I agree with waiting for > > more data. > > FWIW, I think I have found the bug, but I'm still working to reproduce > the issue reliably enough that I can verify that the fix actually works. > > The issue is basically that 2PC PREPARE is weird, WRT procarray. The > last snapshot built with GetSnapshotData() before the PREPARE doesn't > include its own transaction in ->xip[], as normal. PrepareTransaction() > removes the "normal" entry with ProcArrayClearTransaction(), which so > far doesn't increase the xact completion count. Because the xact > completion count is not increased, snapshots can be reused as long as > they're taken before the 2PC transaction is finished. That's fine for > other backends, but for the backend doing the PrepareTransaction() it's > not, because there ->xip doesn't include the own backend. > > It's a bit tricky to reproduce exactly the issue the BF is occasionally > hitting, because the way ->xmax is computed *limits* the > damage. Combined with the use of SERIALIZABLE (preventing recomputation > of the data snapshot) that makes it somewhat hard to hit.
I pushed a fix. After a while I figured out that it's not actually that hard to test reliably. But it does require multiple sessions interacting, particularly another session needs to acquire and commit a transaction id that's later than the prepared transaction's. I think it's worth adding an isolation test. But it doesn't seem like extending prepared-transactions.spec makes too much sense, it doesn't fit in well. It's a lot easier to reproduce the issue without SERIALIZABLE, for example. Generally the file seems more about serializable than 2PC... So unless somebody disagrees I'm gonna add a new prepared-transactions-2.spec. Greetings, Andres Freund
