Laments at:

I tried this on my MacBook Pro this morning, using pgbench -i -s 500
to create a database about 7.5GB in size, and then using "SELECT
sum(1) FROM pgbench_accounts" as a test query, on a build WITHOUT
--enable-cassert.  This machine has 4GB of memory, and I set
shared_buffers = 400MB.  (No, I'm not sure whether that's the optimal
setting for shared_buffers for this machine.)

With enable_indexonlyscan = false, times for this query are: 96799.747
ms, 89108.987 ms, 114433.664 ms.
With enable_indexonlyscan = true, times were: 16779.325 ms, 16537.726
ms, 16703.229 ms.

That's about six times faster.  It's worth noting that the
pgbench_accounts table has relatively narrow rows.  On a table with
wider rows (but not so wide that they get toasted and become narrow
again), the benefit might be more.  On the other hand, this is also a
table that's just been vacuumed, and you've got the happy case where
the table (6404 MB) does not fit in memory but but the index (1071 MB)
does.  What happens on a smaller test case?  Here are the results at
scale factor = 100:

enable_indexonlyscan = false times: 1774.945 ms, 1784.985 ms, 1836.099 ms
enable_indexonlyscan = true times: 1450.445 ms, 1452.407 ms, 1452.426 ms

That's about a 23% speedup.  At this scale factor, everything fits
into memory, but the index by itself (214 MB) fits into memory while
the table (1281 MB) does not.  Let's crank the scale factor down some
more.  Here are the results at scale_factor = 20:

enable_indexonlyscan = false times: 352.213 ms, 353.988 ms, 350.859 ms
enable_indexonlyscan = true times: 300.623 ms, 301.355 ms, 302.346 ms

Now the entire database fits into shared_buffers, but we're still
seeing a 17% speedup.  But this turns out to misleading.  The
ring-buffer logic is actually preventing shared_buffers from getting
all of the heap blocks in cache quickly.  If I run the query over and
over again until the whole table actually makes it into shared
buffers, the sequential scan gets much faster:

enable_indexonlyscan = false times after lots and lots of prewarming:
215.487 ms, 219.006 ms, 218.490 ms

That's a bit disappointing - it's now more than a third faster to do
the sequential scan, even though the sequential scan has to touch six
times as many blocks (at scale factor 20, index is 43 MB, table is 256
MB) all of which are in cache.  Of course, touching that many fewer
blocks does have some advantages if there is concurrent activity on
the system, but it still seems unfortunate that the ratio of runtime
to blocks touched is more than 8x higher for the index-only case.

Robert Haas
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (
To make changes to your subscription:

Reply via email to