On Fri, May 8, 2026 at 2:00 PM Alvaro Herrera <[email protected]> wrote: > > Hello James, > > On 2026-May-08, James Locke wrote: > > > Attached is a POC to enable userland table compaction: A top-level COMPACT > > command that performs the relocation directly in the server, with a > > stripped-down heap_relocate primitive instead of full UPDATE, and a > > built-in prune-and-truncate pass so it runs to a useful end state in one > > command. > > How does this implementation handle the case of a seqscan in the middle > of scanning the table, which has already skipped the destination page > and not yet the page from where the table is to be removed? There needs > to be a way to distinguish which of these to show (it must be exactly > one), and you didn't mention this in your description.
It's the same invariant a cross-page UPDATE relies on, and heap_relocate inherits it because the on-disk and WAL record are identical to a regular update. heap_relocate sets the source's xmax and the new tuple's xmin to the same xid (the relocator's), and both writes go through one log_heap_update AL record. So when HeapTupleSatisfiesMVCC asks "is this visible" for either tuple, it ends up asking the same XidInMVCCSnapshot(R, snap) question against the eqscan's snapshot; once for the destination's xmin and once for the source's xmax. Same xid, same answer. seqscan reads block 5 first and sees no live tuple there, either because the relocation hasn't happened yet, or it has but R is still in the snapshot's xip list so xmin reads as in-progress. Then COMPACT commits cluster-wide. Seqscan reaches block 200 still using the snapshot it took at scan start, which treats R the same way it did at block 5; snapshots don't change mid-scan. So either both pages treated R as committed (block 5 returned the row already, block 200 now sees the source as dead) or both treated it as running (block 5 saw nothing, block 200 returns the source). Exactly one. The page-level atomicity comes from log_heap_update registering both buffers in one record and the modifications happening inside one RIT_SECTION with exclusive content locks on both pages; concurrent share-locking readers can't see half-applied state. James
