Given a freshly created table (not vacuumed), and a query that uses an index-only scan, for example:
CREATE TABLE foo(a int PRIMARY KEY); INSERT INTO foo SELECT * FROM generate_series(1,1000000); ANALYSE foo; EXPLAIN ANALYSE SELECT count(*) FROM foo WHERE a <= 10000; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=322.86..322.87 rows=1 width=0) (actual time=23.646..23.646 rows=1 loops=1) -> Index Only Scan using foo_pkey on foo (cost=0.00..300.42 rows=8975 width=0) (actual time=0.027..22.291 rows=10000 loops=1) Index Cond: (a <= 10000) Heap Fetches: 10000 Total runtime: 23.673 ms (5 rows) the actual performance is much worse than the equivalent index scan, as used in 9.1 and earlier: SET enable_indexonlyscan = off; EXPLAIN ANALYSE SELECT count(*) FROM foo WHERE a <= 10000; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=322.86..322.87 rows=1 width=0) (actual time=3.219..3.219 rows=1 loops=1) -> Index Scan using foo_pkey on foo (cost=0.00..300.42 rows=8975 width=0) (actual time=0.014..2.302 rows=10000 loops=1) Index Cond: (a <= 10000) Total runtime: 3.240 ms (4 rows) Obviously this is the worst-case for an index-only scan, since there is no visibility map, and it has to fetch each tuple from the heap, but ideally this should perform around the same as an ordinary index scan, since it's doing pretty much the same work. Digging around, it looks like the additional cost is coming from visibilitymap_test(), which is calling smgrexists() for each tuple, to see if the visibility map file has been created. So it's doing a file access check for each row, while the visibility map doesn't exist. I'm not really familiar with this code, but a possible fix seems to be to send an invalidation message in vm_extend() when it creates or extends the visibility map, so that vm_readbuf() only has to re-check the visibility map file if smgr_vm_nblocks has been invalidated. With this change (attached) the worst-case index-only scan time becomes around the same as the index scan time. Regards, Dean
index-only-scan.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers