So, what do we need in order to find our way to index-only scans?

1. The visibility map needs to be crash-safe.  The basic idea of
index-only scans is that, instead of checking the heap to find out
whether each tuple is visible, we first check the visibility map.  If
the visibility map bit is set, then we know all tuples on the page are
visible to all transactions, and therefore the tuple of interest is
visible to our transaction.  Assuming that a significant number of
visibility map bits are set, this should enable us to avoid a fair
amount of I/O, especially on large tables, because the visibility map
is roughly 8000 times smaller than the heap, and therefore far more
practical to keep in cache.  However, before we can rely on the
visibility map for this purpose, we need to fix the problems that can
leave bits set inappropriately in the face of an inconveniently-timed
crash.  I've been working on a patch for this on and off for a few
months now; my latest version is in need of review[1].

2. Crash safe visibility map vs. pg_upgrade.  Even if we make the
visibility map crash-safe in 9.2, people are going to want to use
pg_upgrade to migrate from older versions, bringing their
possibly-not-quite-correct visibility map forks along with them.  How
should we handle that?  We could (2A) arrange to have pg_upgrade nuke
all visibility forks when upgrading from a release where the
visibility map is not crash-safe to one where it is; (2B) keep a piece
of state somewhere indicating, for each relation, whether or not the
visibility map can be trusted, set it to false only if pg_upgrade
brings the relation over from and older version, and set it to true
after a successful vacuum that skips no intervening pages; or (2C)
advise the user to do a VACUUM FULL on each of their tables
pre-upgrade, and if they don't, treat wrong answers as their own
fault.  (I doubt anyone will advocate for this option, but for the
sake of completeness...).  (2A) seems like the simplest solution,
especially because it also avoids the overhead of checking the "is the
visibility map bit reliable?" flag every time we want to plan a query.

3. Statistics.  I believe that in order to accurately estimate the
cost of an index-only scan, we're going to need to know the fraction
of tuples that are on pages whose visibility map bits are set.  I
believe it should be fairly straightforward to have ANALYZE collect
this information; and I'm inclined to do that as a separate patch.  It
seems like it would also be nice to know what fraction of tuples are
on pages that don't have the visibility map set but where, in fact,
all tuples on the page are visible to all transactions, so it would be
legal to set the bit.  A large discrepancy between these two
percentages might be a good reason to auto-vacuum the table (perhaps
using a "really lazy vacuum"[2]).  I'm not sure if this can be added
cheaply, though, and in any case, any change to the set of criteria
that will trigger an auto-vacuum is probably a can of worms.

4. Planner and executor changes.  In contrast to Heikki's original
implementation, I'm inclined to not to try to split the Index Scan
node into index scan and heap fetch.  Since there are many choices for
where to put the heap fetch node (any level of the join tree between
the index scan and the root), this seems likely to result in a
combinatorial explosion of paths[3], and I'm not real sure that the
payback will be adequate.  Furthermore, the idea of allowing user code
to see tuples that will only later be determined not to have been
visible to that MVCC snapshot in the first place seems pretty scary
from a security perspective, though certainly there are possible
benefits[4].  Instead, I'm inclined to just have the planner evaluate
whether the necessary columns can be extracted purely from the index.
If not, we proceed as now.  If so, we can use the "index only"
approach of using the visibility map to decide which heap fetches can
be skipped.  It's not clear to me whether we need to compare the cost
of the standard approach with the cost of the "index only" approach:
in theory, if there aren't any visibility map bits anyway, the "index
only" approach could be slower.  But I'm not sure whether that problem
is significant or common enough to be worth expending a lot of code
on.  Either way, the number of actual paths doesn't need to increase,
because in this design, even if we apply a costing model, one approach
will dominate the other.  Heikki also suggested considering index
scans in cases where we don't now[4, again] but I'm inclined to leave
this, too, for a later optimization, again because balancing the
increase in paths against the possible performance benefits of using
indexes in more situations seems finicky.  In short, for a first cut
at this, I just want to look at this as a way to get cheaper index
scans, and leave everything else to future work.

Any thoughts welcome.  Incidentally, if anyone else feels like working
on this, feel free to let me know and I'm happy to step away, from all
of it or from whatever part someone else wants to tackle.  I'm mostly
working on this because it's something that I think we really need to
get done, more than having a burning desire to be the one who does it.

Robert Haas
The Enterprise PostgreSQL Company


Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to