Hi, I noticed that during an index-only scan the filter check is executed after the visibility of a tuple is confirmed. This means a lot of unnecessary heap fetches, if the filter condition is highly selective.
In the application I'm maintaining, the visibility map is mostly dirty. Index-only scans deteriorate into normal index-scans, even when there is no row matching the filter conditions. Attached is a patch, which changes the order of the checks. I.e. the visibility of the row is confirmed only after the qual condition is evaluated to true. To see the performance benefits, consider the following example: create table test_table (id integer primary key, a integer, b text, c text, filler text); insert into test_table (id, a, b, c, filler) select s, 1, md5(s::text), md5(s::text), repeat('a', 100) from generate_series(10000000, 19999999) s; insert into test_table (id, a, b, c, filler) select s, 2, md5(s::text), md5(s::text), repeat('a', 100) from generate_series(20000000, 29999999) s; insert into test_table (id, a, b, c, filler) select s, 3, md5(s::text), md5(s::text), repeat('a', 100) from generate_series(30000000, 39999999) s; create index test_index on test_table (a, b, c); explain analyze select a, b, c from test_table where a = 2 and b > c order by a, b, c limit 1; Before the changes: "Limit (cost=0.69..11.80 rows=1 width=68) (actual time=8136433.536..8136433.536 rows=0 loops=1)" " -> Index Only Scan using test_index on test_table (cost=0.69..555456.69 rows=50000 width=68) (actual time=8136433.533..8136433.533 rows=0 loops=1)" " Index Cond: (a = 2)" " Filter: (b > c)" " Rows Removed by Filter: 10000000" " Heap Fetches: 10000000" "Planning time: 2.054 ms" "Execution time: 8136433.564 ms" After the changes: "Limit (cost=0.69..11.80 rows=1 width=68) (actual time=3060.548..3060.548 rows=0 loops=1)" " -> Index Only Scan using test_index on test_table (cost=0.69..555456.69 rows=50000 width=68) (actual time=3060.544..3060.544 rows=0 loops=1)" " Index Cond: (a = 2)" " Filter: (b > c)" " Rows Removed by Filter: 10000000" " Heap Fetches: 0" "Planning time: 11.684 ms" "Execution time: 3060.615 ms" The example is extreme, but I would say it doesn't differ much from what I see in my production environment. About the naming in the solution: There's already a concept of "index recheck" which seems to be a perfect name for the functionality i'm introducing. However, after looking into the code, I don't think it can be reused, because the current rechecks are evaluated before quals and ExecScanFetch contains a very specific code. Please let me know what you think about the change. -- Mateusz Stefek
indexonlyfilter.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers