David Gould <da...@sonic.net> writes: > On Thu, 01 Mar 2018 18:49:20 -0500 > Tom Lane <t...@sss.pgh.pa.us> wrote: >> The sticking point in my mind right now is, if we do that, what to do with >> VACUUM's estimates.

> For what it's worth, I think the current estimate formula for VACUUM is > pretty reasonable. Consider a table T with N rows and P pages clustered > on serial key k. Assume reltuples is initially correct. If the starting condition involves uniform tuple density throughout the table, with matching reltuples/relpages ratio, then any set of changes followed by one VACUUM will produce the right reltuples (to within roundoff error) at the end. This can be seen by recognizing that VACUUM will visit every changed page, and the existing calculation is equivalent to "assume the old tuple density is correct for the unvisited pages, and then add on the measured tuple count within the visited pages". I'm a bit inclined to reformulate and redocument the calculation that way, in hopes that people would find it more convincing. However, things get less good if the initial state is nonuniform and we do a set of updates that line up with the nonuniformity. For example, start with a uniformly full table, and update 50% of the rows lying within the first 20% of the pages. Now those 20% are only half full of live tuples, and the table has grown by 10%, with all those added pages full. Do a VACUUM. It will process the first 20% and the new 10% of pages, and arrive at a correct reltuples count per the above argument. But now, reltuples/relpages reflects an average tuple density that's only about 90% of maximum. Next, delete the surviving tuples in the first 20% of pages, and again VACUUM. VACUUM will examine only the first 20% of pages, and find that they're devoid of live tuples. It will then update reltuples using the 90% density figure as the estimate of what's in the remaining pages, and that's too small, so that reltuples will drop to about 90% of the correct value. Lacking an oracle (small "o"), I do not think there's much we can do about this, without resorting to very expensive measures such as scanning the whole table. (It's somewhat interesting to speculate about whether scanning the table's FSM could yield useful data, but I'm unsure that I'd trust the results much.) The best we can do is hope that correlated update patterns like this are uncommon. Maybe this type of situation is an argument for trusting an ANALYZE-based estimate more than the VACUUM-based estimate. I remain uncomfortable with that in cases where VACUUM looked at much more of the table than ANALYZE did, though. Maybe we need some heuristic based on the number of pages actually visited by each pass? regards, tom lane