On Mon, Dec 6, 2021 at 8:14 PM Peter Geoghegan <p...@bowt.ie> wrote: > Suppose we believe that not-all-visible pages have 20 LP_DEAD items on > average, and they turn out to only have 3 or 5. Theoretically we've > done the wrong thing by launching autovacuum workers sooner -- we > introduce bias. But we also have lower variance over time, which might > make it worth it. I also think that it might not really matter at all. > It's no great tragedy if we clean up and set pages all-visible in the > visibility map a little earlier on average. It might even be a > positive thing.
This doesn't seem convincing. Launching autovacuum too soon surely has costs that someone might not want to pay. Clearly in the degenerate case where we always autovacuum every table every time an autovacuum worker is launched, we have gone insane. So arbitrarily large moves in that direction can't be viewed as unproblematic. > The fact that the user expresses the dead-tuple-wise threshold using > autovacuum_vacuum_scale_factor is already somewhat arbitrary -- it is > based on some pretty iffy assumptions. Even if we greatly overestimate > dead tuples with the new algorithm, we're only doing so under > circumstances that might have caused > autovacuum_vacuum_insert_scale_factor to launch an autovacuum worker > anyway. Just setting the visibility map bit has considerable value. Now, on the other hand, I *most definitely* think autovacuum_vacuum_scale_factor is hogwash. Everything I've seen indicates that, while you do want to wait for a larger number of dead tuples in a large table than in a small one, it's sublinear. I don't know whether it's proportional to sqrt(table_size) or table_size^(1/3) or lg(table_size) or table_size^(0.729837166538), but just plain old table_size is definitely not it. -- Robert Haas EDB: http://www.enterprisedb.com