On Tue, Nov 30, 2021 at 5:09 PM Peter Geoghegan <p...@bowt.ie> wrote: > Attached draft patch attempts to explain things in this area within > the nbtree README. There is a much shorter comment about it within > vacuumlazy.c. I am concerned about GiST index-only scans themselves, > of course, but I discovered this issue when thinking carefully about > the concurrency rules for VACUUM -- I think it's valuable to formalize > and justify the general rules that index access methods must follow.
I pushed a commit that described how this works for nbtree, in the README file. I think that there might be an even more subtle race condition in nbtree itself, though, during recovery. We no longer do a "pin scan" during recovery these days (see commits 9f83468b, 3e4b7d87, and 687f2cd7 for full information). I think that it might be necessary to do that, just for the benefit of index-only scans -- if it's necessary during original execution, then why not during recovery? The work to remove "pin scans" was justified by pointing out that we no longer use various kinds of snapshots during recovery, but it said nothing about index-only scans, which need the TID recycling interlock (i.e. need to hold onto a leaf page while accessing the heap in sync) even with an MVCC snapshot. It's easy to imagine how it might have been missed: nobody ever documented the general issue with index-only scans, until now. Commit 2ed5b87f recognized they were unsafe for the optimization that it added (to avoid blocking VACUUM), but never explained why they were unsafe. Going back to doing pin scans during recovery seems deeply unappealing, especially to fix a totally narrow race condition. -- Peter Geoghegan