On Wed, Oct 12, 2011 at 2:52 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > I think it's overkill, and possibly unpleasantly unstable as well. > For the initial attack on this we should just have VACUUM and ANALYZE > count the number of all-visible blocks and store that in pg_class along > with the tuple-count statistics. There's no reason to think that this > will be any worse than the way we deal with dead tuple counts, for > instance.
So I have a theory. Assuming you're in a steady-state situation the amount of all-visible blocks will fluctuate from a high just after vacuum to a low just before the next vacuum. There are other ways a block can be marked all-visible but for the most part I would expect the fraction to go steadily down until vacuum comes along and cleans things up. So if vacuum tracked the fraction of blocks marked all-visible *before* it processed them and the fraction it marked all-visible after processing we have an upper and lower bound. If we knew how long it's been since vacuum we could interpolate between those, or we could just take the mean, or we could take the lower bound as a conservative estimate. > What I suggest as a first cut for that is: simply derate the visibility > fraction as the fraction > of the table expected to be scanned gets smaller. I think there's a statistically more rigorous way of accomplishing the same thing. If you treat the pages we estimate we're going to read as a random sample of the population of pages then your expected value is the fraction of the overall population that is all-visible but your 95th percentile confidence interval will be, uh, a simple formula we can compute but I don't recall off-hand. This gets back to a discussion long-ago of what estimates the planner should be using. It currently uses all expected values but in many cases it would be valuable if the planner knew what the standard deviation of those estimates was. It might sometimes be better to pick a plan that's slightly worse on average but is less likely to be much worse. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers