On Sat, Aug 27, 2022 at 1:06 PM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > On Sat, Aug 27, 2022 at 3:56 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Fri, Jul 29, 2022 at 12:15 PM Amit Kapila <amit.kapil...@gmail.com> > > wrote: > > > > > > > > > > > Yeah, your description makes sense to me. I've also considered how to > > > > hit this path but I guess it is never hit. Thinking of it in another > > > > way, first of all, at least 2 catalog modifying transactions have to > > > > be running while writing a xl_running_xacts. The serialized snapshot > > > > that is written when we decode the first xl_running_xact has two > > > > transactions. Then, one of them is committed before the second > > > > xl_running_xacts. The second serialized snapshot has only one > > > > transaction. Then, the transaction is also committed after that. Now, > > > > in order to execute the path, we need to start decoding from the first > > > > serialized snapshot. However, if we start from there, we cannot decode > > > > the full contents of the transaction that was committed later. > > > > > > > > > > I think then we should change this code in the master branch patch > > > with an additional comment on the lines of: "Either all the xacts got > > > purged or none. It is only possible to partially remove the xids from > > > this array if one or more of the xids are still running but not all. > > > That can happen if we start decoding from a point (LSN where the > > > snapshot state became consistent) where all the xacts in this were > > > running and then at least one of those got committed and a few are > > > still running. We will never start from such a point because we won't > > > move the slot's restart_lsn past the point where the oldest running > > > transaction's restart_decoding_lsn is." > > > > > > > Unfortunately, this theory doesn't turn out to be true. While > > investigating the latest buildfarm failure [1], I see that it is > > possible that only part of the xacts in the restored catalog modifying > > xacts list needs to be purged. See the attached where I have > > demonstrated it via a reproducible test. It seems the point we were > > missing was that to start from a point where two or more catalog > > modifying were serialized, it requires another open transaction before > > both get committed, and then we need the checkpoint or other way to > > force running_xacts record in-between the commit of initial two > > catalog modifying xacts. There could possibly be other ways as well > > but the theory above wasn't correct. > > > > Thank you for the analysis and the patch. I have the same conclusion. > Since we took this approach only on the master the back branches are > not affected. > > The new test scenario makes sense to me and looks better than the one > I have. Regarding the fix, I think we should use > TransactionIdFollowsOrEquals() instead of > NormalTransactionIdPrecedes(): > > + for (off = 0; off < builder->catchange.xcnt; off++) > + { > + if (NormalTransactionIdPrecedes(builder->catchange.xip[off], > + builder->xmin)) > + break; > + } >
Right, fixed. -- With Regards, Amit Kapila.
v2-0001-Fix-the-incorrect-assertion-introduced-in-commit-.patch
Description: Binary data