On Fri, Apr 1, 2022 at 11:04 AM Robert Haas <robertmh...@gmail.com> wrote: > I guess you're right, and it's actually a little bit better than that, > because even if the data does fit into shared memory, we'll have to > pass fewer TIDs to the worker to be removed from the heap, which might > save a few CPU cycles. But I think both of those are very small > benefits.
I'm not following. It seems like you're saying that the ability to vacuum indexes on their own schedule (based on their own needs) is not sufficiently compelling. I think it's very compelling, with enough indexes (and maybe not very many). The conveyor belt doesn't just save I/O from repeated scanning of the heap. It may also save on repeated pruning (or just dirtying) of the same heap pages again and again, for very little benefit. Imagine an append-only table where 1% of transactions that insert are aborts. You really want to be able to constantly VACUUM such a table, so that its pages are proactively frozen and set all-visible in the visibility map -- it's not that different to a perfectly append-only table, without any garbage tuples. And so it would be very useful if we could delay index vacuuming for much longer than the current 2% of rel_pages heuristics seems to allow. That heuristic has to conservatively assume that it might be some time before the next vacuum is launched, and has the opportunity to reconsider index vacuuming. What if it was a more or less independent question instead? To put it another way, it would be great if the scheduling code for autovacuum could make inferences about what general strategy works best for a given table over time. In order to be able to do that sensibly, the algorithm needs more context, so that it can course correct without paying much of a cost for being wrong. -- Peter Geoghegan