On Wed, Oct 12, 2011 at 11:59 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Robert Haas <robertmh...@gmail.com> writes:
>> I'm not concerned about an index scan vs. a sequential scan here.  I'm
>> concerned about it being impossible for the DBA to get an index-only
>> scan when s/he wants it very badly.  The current (stupid) formula
>> handles this case just about perfectly - it will prefer a smaller
>> index over a larger one, except when a covering index is available, in
>> which case it will prefer the smallest covering index.  That sounds
>> exactly right to me.  We get that behavior because the 10% of heap
>> fetches that we're assuming we'll get to skip is larger than the
>> penalty for using a bigger index.  If we take out 10% and replace it
>> by all_visible_percentage * fraction_of_tuples_fetched, then that 10%
>> is going to drop to some infinitesmally small value on single row
>> fetches from giant tables.  But that's exactly one of the cases for
>> which people want index-only scans in the first place.  It's no better
>> to be overly pessimistic here than it is to be overly optimistic.  If
>> the table is 90% all-visible, the probability of our finding an
>> all-visible row is probably not 90%.  But it's probably not 0.01% or
>> 0.0001% either.
>
> I think you're overstating the size of the problem.  Given that fact
> pattern, the thing will choose an indexscan anyway, because it's the
> cheapest alternative even under traditional costing rules.  And it will
> choose to use an index-only scan if the index is covering.  It doesn't
> matter what the exact cost estimate is.

Maybe I'm not being clear.  The case I'm worried about is when you
have a table with an id column (an integer) and a name column (text).
You have a unique index on (id), and an index on (id, name).  You then
do SELECT name FROM foo WHERE id = ?.  I understand it's going to pick
*an* index-scan, but which index is it going to pick?  The unique
index, because it's smaller, or the other one, because it's covering?
I think if the table is large your proposal will lead to ignoring the
covering index in favor of the smaller one, and I submit that's not
what we want, because, on the average, the index-only approach is
going to succeed often enough to

> The place where the decision is actually somewhat hard, IMO, is where
> you're pulling a small part of the table but significantly more than one
> row, and the traditional best choice would be a bitmap scan.  Now we
> have to guess whether possibly avoiding heap fetches is better than
> batching the fetches, and it doesn't seem open-and-shut to me.

Yes, I agree.

I was actually wondering if there is some way we could make index-only
scans work for bitmap index scans.  Something like this: If the index
is covering, then as we retrieve each tuple, we check whether the page
is all-visible.  If so, we return the data from the index tuple.  If
not, we save the TID for later.  Then, when we get done scanning the
index, we go back and fetch all the pages containing saved TIDs in
ascending block number order.  The trouble is that I think you could
get in some trouble if you use a TID bitmap here, because if you
lossify the bitmap then you have to make sure you can't return a tuple
that you already handled with the index-only approach (imagine that
the visibility map bit gets cleared partway through the scan).  All in
all this seems pretty complicated...

> But having said that, I see some credibility in Aidan's suggestion that
> pages that actually have to be fetched from disk are disproportionately
> likely to be all-visible.  That would counteract the history-table
> effect of correlation between current reads and recent changes,
> probably not perfectly, but to a considerable extent.  So right at the
> moment I'm inclined to just apply the most-recently-measured visibility
> fraction at face value.  We shouldn't complicate matters more than that
> until we have more field experience to tell us what really happens.

Fetches from disk aren't the only problem; thrashing shared_buffers is
pretty expensive, too.  But it's an interesting point.  I guess we
could give it a try and see what happens.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to