On Thu, Apr 14, 2022 at 4:19 PM Jim Nasby <nas...@amazon.com> wrote: > > - percentage of non-yet-removable vs removable tuples > > This'd give you an idea how bad your long-running-transaction problem is.
VACUUM fundamentally works by removing those tuples that are considered dead according to an XID-based cutoff established when the operation begins. And so many very long running VACUUM operations will see dead-but-not-removable tuples even when there are absolutely no long running transactions (nor any other VACUUM operations). The only long running thing involved might be our own long running VACUUM operation. I would like to reduce the number of non-removal dead tuples encountered by VACUUM by "locking in" heap pages that we'd like to scan up front. This would work by having VACUUM create its own local in-memory copy of the visibility map before it even starts scanning heap pages. That way VACUUM won't end up visiting heap pages just because they were concurrently modified half way through our VACUUM (by some other transactions). We don't really need to scan these pages at all -- they have dead tuples, but not tuples that are "dead to VACUUM". The key idea here is to remove a big unnatural downside to slowing VACUUM down. The cutoff would almost work like an MVCC snapshot, that described precisely the work that VACUUM needs to do (which pages to scan) up-front. Once that's locked in, the amount of work we're required to do cannot go up as we're doing it (or it'll be less of an issue, at least). It would also help if VACUUM didn't scan pages that it already knows don't have any dead tuples. The current SKIP_PAGES_THRESHOLD rule could easily be improved. That's almost the same problem. -- Peter Geoghegan