Hi, On 2022-02-19 17:22:33 -0800, Peter Geoghegan wrote: > Looks like pg_surgery isn't processing HOT chains as whole units, > which it really should (at least in the context of killing items via > the heap_force_kill() function). Killing a root item in a HOT chain is > just hazardous -- disconnected/orphaned heap-only tuples are liable to > cause chaos, and should be avoided everywhere (including during > pruning, and within pg_surgery).
How does that cause the endless loop? It doesn't do so on HEAD + 0001-Add-adversarial-ConditionalLockBuff[...] for me. So something needs have changed with your patch? > It's likely that the hardening I already planned on adding to pruning > [1] (as follow-up work to recent bugfix commit 18b87b201f) will > prevent lazy_scan_prune from getting stuck like this, whatever the > cause happens to be. Yea, we should pick that up again. Not just for robustness or performance. Also because it's just a lot easier to understand. > Leaving behind disconnected/orphaned heap-only tuples is pretty much > pointless anyway, since they'll never be accessible by index scans. > Even after a REINDEX, since there is no root item from the heap page > to go in the index. (A dump and restore might work better, though.) Given that heap_surgery's raison d'etre is correcting corruption etc, I think it makes sense for it to do as minimal work as possible. Iterating through a HOT chain would be a problem if you e.g. tried to repair a page with HOT corruption. Greetings, Andres Freund