My bugfix commit 74388a1a (which was pushed back in February) added heuristics to VACUUM's reltuples calculation/estimate. This prevented VACUUM from distorting our estimate of reltuples over time, across successive VACUUM operations run against the same table. The problem was that VACUUM could scan the same single heap page again and again, while believing it saw a random sample each time. This eventually leads to a pg_class.reltuples value that is based on the assumption that every single heap page in the table is just like the heap page that gets "sampled" again and again. This was always the last heap page (due to implementation details related to the work done by commit e8429082), which in practice tend to be particularly poor representations of the overall reltuples density of tables.
I have discovered a gap in these heuristics: there are remaining cases where its percentage threshold doesn't prevent reltuples distortion as intended. It can still happen with tables that are small enough that a cutoff of 2% of rel_pages is less than a single page, yet still large enough that vacuumlazy.c will consider it worth its while to skip some pages using the visibility map. It will typically skip all but the final heap page from the relation (same as the first time around). Here is a test case that shows how this can still happen on HEAD (and in Postgres 15): regression=# create table foo(bar int);insert into foo select i from generate_series(1, 10000) i; CREATE TABLE INSERT 0 10000 Now run vacuum verbose against the table several times: regression=# vacuum verbose foo; *** SNIP *** regression=# vacuum verbose foo; The first vacuum shows "tuples: 0 removed, 10000 remain...", which is correct. However, each subsequent vacuum verbose revises the estimate downwards, eventually making pg_class.reltuples significantly underestimate tuple density (same as the first time around). Attached patch fixes closes the remaining gap. With the patch applied, the second and subsequent vacuum verbose operations from the test case will show that reltuples is still 10000 (it won't ever change). The patch just extends an old behavior that was applied when scanned_pages == 0 to cases where scanned_pages <= 1 (unless we happened to scan all of the relation's tables, of course). It doesn't remove the original test from commit 74388a1a, which still seems like a good idea to me. -- Peter Geoghegan
v1-0001-Avoid-reltuples-distortion-in-very-small-tables.patch
Description: Binary data