On Tue, Jan 10, 2023 at 12:08 PM Peter Geoghegan <p...@bowt.ie> wrote: > Actually, FreezeMultiXactId() can fully remove an xmax that has some > member XIDs >= OldestXmin, provided FRM_NOOP processing isn't > possible, at least when no individual member is still running. Doesn't > have to involve transaction aborts at all. > > Let me go try to break it that way...
Attached patch shows how this could break. It adds an assertion that checks that the expected PD_ALL_VISIBLE/VM_ALL_VISIBLE() conditions hold at the right point. It also comments out FreezeMultiXactId()'s FRM_NOOP handling. The FRM_NOOP case is really just an optimization, and shouldn't be needed for correctness. This is amply demonstrated by running "meston test" with the patch applied, which will pass without incident. I can get the PD_ALL_VISIBLE assertion to fail by following this procedure with the patch applied: * Run a plain VACUUM to set all the pages from a table all-visible, but not all-frozen. * Set a breakpoint that will hit after all_visible_according_to_vm is set to true, for an interesting blkno. * Run VACUUM FREEZE. We need FREEZE in order to be able to hit the relevant visibilitymap_set() call site (the one that passes VISIBILITYMAP_ALL_FROZEN as its flags, without also passing VISIBILITYMAP_ALL_VISIBLE). Now all_visible_according_to_vm is set to true, but we don't have a lock/pin on the same heap page just yet. * Acquire several non-conflicting row locks on a row on the block in question, so that a new multi is allocated. * End every session whose XID is stored in our multi (commit/abort). * Within GDB, continue from before -- observe assertion failure. Obviously this scenario doesn't demonstrate the presence of a bug -- not quite. But it does prove that we rely on FRM_NOOP to not allow the VM to become corrupt, which just doesn't make any sense, and can't have been intended. At a minimum, it strongly suggests that the current approach is very fragile. -- Peter Geoghegan
0001-Debug-freeze-map-issue.patch
Description: Binary data