Hi, On 2022-02-19 19:07:39 -0800, Peter Geoghegan wrote: > On Sat, Feb 19, 2022 at 7:01 PM Andres Freund <and...@anarazel.de> wrote: > > > We can either do that, or we can throw an error concerning corruption > > > when heap_page_prune notices orphaned tuples. Neither seems > > > particularly appealing. But it definitely makes no sense to allow > > > lazy_scan_prune to spin in a futile attempt to reach agreement with > > > heap_page_prune about a DEAD tuple really being DEAD. > > > > Yea, this sucks. I think we should go for the rewrite of the > > heap_prune_chain() logic. The current approach is just never going to be > > robust. > > No, it just isn't robust enough. But it's not that hard to fix. My > patch really wasn't invasive.
I think we're in agreement there. We might think at some point about backpatching too, but I'd rather have it stew in HEAD for a bit first. > I confirmed that HeapTupleSatisfiesVacuum() and > heap_prune_satisfies_vacuum() agree that the heap-only tuple at offnum > 2 is HEAPTUPLE_DEAD -- they are in agreement, as expected (so no > reason to think that there is a new bug involved). The problem here is > indeed just that heap_prune_chain() can't "get to" the tuple, given > its current design. Right. The reason that the "adversarial" patch makes a different is solely that it changes the heap_surgery test to actually kill an item, which it doesn't intend: create temp table htab2(a int); insert into htab2 values (100); update htab2 set a = 200; vacuum htab2; -- redirected TIDs should be skipped select heap_force_kill('htab2'::regclass, ARRAY['(0, 1)']::tid[]); If the vacuum can get the cleanup lock due to the adversarial patch, the heap_force_kill() doesn't do anything, because the first item is a redirect. However if it *can't* get a cleanup lock, heap_force_kill() instead targets the root item. Triggering the endless loop. Hm. I think this might be a mild regression in 14. In < 14 we'd just skip the tuple in lazy_scan_heap(), but now we have an uninterruptible endless loop. We'd do completely bogus stuff later in < 14 though, I think we'd just leave it in place despite being older than relfrozenxid, which obviously has its own set of issues. Greetings, Andres Freund