On Mon, Nov 28, 2022 at 3:01 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > One thing we need to be pretty careful of here is to not break the > promise of atomic commit. At topmost commit, all subxacts must > appear committed simultaneously. It's not quite clear to me whether > we need a similar guarantee in the rollback case. It seems like > we shouldn't, but maybe I'm missing something, in which case maybe > the current behavior is correct?
AFAICS, the behavior that Simon is complaining about here is an exception to the way things work in general, and therefore his complaint is well-founded. AbortSubTransaction() arranges to release resources, whereas CommitSubTransaction() arranges to have them transferred to the parent transaction. This isn't necessarily obvious from looking at those functions, but if you drill down into the AtEO(Sub)Xact functions that they call, that's what happens in nearly all cases. Given that subtransaction abort releases resources immediately, it seems pretty fair to wonder what the value is in waiting for its parent or the topmost transaction. I don't see how that can be necessary for correctness. The commit message to which Simon (3c27944fb2141) points seems to have inadvertently changed the behavior while trying to fix a bug and improve performance. I remember being a bit skeptical about that fix at the time. Back in the day, you couldn't XactLockTableWait() unless you knew that the transaction had already started. That commit tried to make it so that you could XactLockTableWait() earlier, because that's apparently something that logical replication needs to do. But that is a major redefinition of the charter of that function, and I am wondering whether it was a mistake to fold together the thing that we need in normal cases (which is to wait for a transaction we know has started and may not have finished) from the thing we need in the logical decoding case (which apparently has different requirements). Maybe we should have solved that problem by finding a way to wait for the transaction to start, and then afterwards wait for it to end. Or maybe we should have altogether different entrypoints for the two requirements. Or maybe using one function is fine but we just need it to be more correct. I'm not really sure. In short, I think Simon is right that there's a problem and right about which commit caused it, but I'm not sure what I think we ought to do about it. -- Robert Haas EDB: http://www.enterprisedb.com