Re: [HACKERS] COUNT(*) and index-only scans

Robert Haas Wed, 12 Oct 2011 08:45:31 -0700

On Wed, Oct 12, 2011 at 10:37 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> On Wed, Oct 12, 2011 at 9:52 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> What bothers me considerably more is the issue about how specific
>>> queries might see an all-visible fraction that's very substantially
>>> different from the table's overall ratio,
>
>> - Suppose VACUUM processes the table and makes it all-visible.  Then,
>> somebody comes along and updates one tuple on every page, making them
>> all not-all-visible, but not trigger VACUUM because we're nowhere
>> close the 20% threshold.  Now COUNT(*) will think it should use an
>> index-scan, but really... not so much.  In fact, even if it's only
>> that a tuple has been updated on 25% of the pages, we're probably in
>> trouble.
>
> Yeah, but that would be a pretty unlucky pattern, and in any case the
> fix for it is going to be to make autovacuum more aggressive.


Hmm, maybe.

>> - Suppose the table has a million rows and we're going to read 100 of
>> them, or 0.01%.  Now it might appear that a covering index has a
>> negligible advantage over a non-covering index, but in fact I think we
>> still want to err on the side of trying to use the covering index.
>
> Given that fact pattern we still will, I think.  We'll still prefer an
> indexscan over a seqscan, for sure.  In any case, if you believe the
> assumption that those 100 rows are more likely to be recently-dirtied
> than the average row, I'm not sure why you think we should be trying to
> force an assumption that index-only will succeed here.

I'm not concerned about an index scan vs. a sequential scan here.  I'm
concerned about it being impossible for the DBA to get an index-only
scan when s/he wants it very badly.  The current (stupid) formula
handles this case just about perfectly - it will prefer a smaller
index over a larger one, except when a covering index is available, in
which case it will prefer the smallest covering index.  That sounds
exactly right to me.  We get that behavior because the 10% of heap
fetches that we're assuming we'll get to skip is larger than the
penalty for using a bigger index.  If we take out 10% and replace it
by all_visible_percentage * fraction_of_tuples_fetched, then that 10%
is going to drop to some infinitesmally small value on single row
fetches from giant tables.  But that's exactly one of the cases for
which people want index-only scans in the first place.  It's no better
to be overly pessimistic here than it is to be overly optimistic.  If
the table is 90% all-visible, the probability of our finding an
all-visible row is probably not 90%.  But it's probably not 0.01% or
0.0001% either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COUNT(*) and index-only scans

Reply via email to