On Sun, 2022-12-18 at 14:20 -0800, Peter Geoghegan wrote: > On Thu, Dec 15, 2022 at 10:53 AM Peter Geoghegan <p...@bowt.ie> wrote: > > I agree that the burden of catch-up freezing is excessive here (in > > fact I already wrote something to that effect on the wiki page). > > The > > likely solution can be simple enough. > > Attached is v10, which fixes this issue, but using a different > approach to the one I sketched here.
Comments on 0002: Can you explain the following portion of the diff: - else if (MultiXactIdPrecedes(multi, cutoffs->MultiXactCutoff)) + else if (MultiXactIdPrecedes(multi, cutoffs->OldestMxact)) ... + /* Can't violate the MultiXactCutoff invariant, either */ + if (!need_replace) + need_replace = MultiXactIdPrecedes( + multi, cutoffs->MultiXactCutoff); Regarding correctness, it seems like the basic structure and invariants are the same, and it builds on the changes already in 9e5405993c. Patch 0002 seems *mostly* about making choices within the existing framework. That gives me more confidence. That being said, it does push harder against the limits on both sides. If I understand correctly, that means pages with wider distributions of xids are going to persist longer, which could expose pre-existing bugs in new and interesting ways. Next, the 'freeze_required' field suggests that it's more involved in the control flow that causes freezing than it actually is. All it does is communicate how the trackers need to be adjusted. The return value of heap_prepare_freeze_tuple() (and underneath, the flags set by FreezeMultiXactId()) are what actually control what happens. It would be nice to make this more clear somehow. The comment: /* * If we freeze xmax, make absolutely sure that it's not an XID that * is important. (Note, a lock-only xmax can be removed independent * of committedness, since a committed lock holder has released the * lock). */ caused me to go down a rabbit hole looking for edge cases where we might want to freeze an xmax but not an xmin; e.g. tup.xmax < OldestXmin < tup.xmin or the related case where tup.xmax < RecentXmin < tup.xmin. I didn't find a problem, so that's good news. I also tried some pgbench activity along with concurrent vacuums (and vacuum freezes) along with periodic verify_heapam(). No problems there. Did you already describe the testing you've done for 0001+0002 specfiically? It's not radically new logic, but it would be good to try to catch minor state-handling errors. -- Jeff Davis PostgreSQL Contributor Team - AWS