On Wed, Oct 12, 2011 at 10:37 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>> What bothers me considerably more is the issue about how specific >>> queries might see an all-visible fraction that's very substantially >>> different from the table's overall ratio, > >> - Suppose VACUUM processes the table and makes it all-visible. Then, >> somebody comes along and updates one tuple on every page, making them >> all not-all-visible, but not trigger VACUUM because we're nowhere >> close the 20% threshold. Now COUNT(*) will think it should use an >> index-scan, but really... not so much. In fact, even if it's only >> that a tuple has been updated on 25% of the pages, we're probably in >> trouble. > > Yeah, but that would be a pretty unlucky pattern, and in any case the > fix for it is going to be to make autovacuum more aggressive.
Hmm, maybe. >> - Suppose the table has a million rows and we're going to read 100 of >> them, or 0.01%. Now it might appear that a covering index has a >> negligible advantage over a non-covering index, but in fact I think we >> still want to err on the side of trying to use the covering index. > > Given that fact pattern we still will, I think. We'll still prefer an > indexscan over a seqscan, for sure. In any case, if you believe the > assumption that those 100 rows are more likely to be recently-dirtied > than the average row, I'm not sure why you think we should be trying to > force an assumption that index-only will succeed here. I'm not concerned about an index scan vs. a sequential scan here. I'm concerned about it being impossible for the DBA to get an index-only scan when s/he wants it very badly. The current (stupid) formula handles this case just about perfectly - it will prefer a smaller index over a larger one, except when a covering index is available, in which case it will prefer the smallest covering index. That sounds exactly right to me. We get that behavior because the 10% of heap fetches that we're assuming we'll get to skip is larger than the penalty for using a bigger index. If we take out 10% and replace it by all_visible_percentage * fraction_of_tuples_fetched, then that 10% is going to drop to some infinitesmally small value on single row fetches from giant tables. But that's exactly one of the cases for which people want index-only scans in the first place. It's no better to be overly pessimistic here than it is to be overly optimistic. If the table is 90% all-visible, the probability of our finding an all-visible row is probably not 90%. But it's probably not 0.01% or 0.0001% either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers