Hi I have a "feeling" that the internal query-plans (or alternative) query-plans when executing GIN-searches are not being exhausted as much as they generally are in PG.
More specifically a query like: select id from table where fts @@ to_tsquery('english','verycommon & veryrare'); Can do an search on the "very rare" and postfilter it on the very-common keyword. I had problems trying to force the query-planner into executing a query that forced that behavior, but here is my attempt: 2011-11-30 11:33:41.010 testdb=# explain analyze select id from testdb.testtable where id in (select id from testdb.testtable where fts @@ pptsquery('veryrare')) and fts @@ pptsquery('verycommon') order by id desc limit 300; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=14203.33..14204.08 rows=300 width=4) (actual time=62.561..62.567 rows=14 loops=1) -> Sort (cost=14203.33..14206.91 rows=1430 width=4) (actual time=62.561..62.566 rows=14 loops=1) Sort Key: testdb.testtable.id Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=4799.17..14137.35 rows=1430 width=4) (actual time=11.225..62.530 rows=14 loops=1) -> HashAggregate (cost=4799.17..4813.47 rows=1430 width=4) (actual time=6.792..7.941 rows=2409 loops=1) -> Bitmap Heap Scan on testtable (cost=1943.29..4795.59 rows=1430 width=4) (actual time=0.962..5.174 rows=2409 loops=1) Recheck Cond: (fts @@ '''veryrare'''::tsquery) -> Bitmap Index Scan on testtable_gin_idx (cost=0.00..1942.93 rows=1430 width=0) (actual time=0.635..0.635 rows=2419 loops=1) Index Cond: (fts @@ '''veryrare'''::tsquery) -> Index Scan using testtable_pkey on testtable (cost=0.00..6.51 rows=1 width=4) (actual time=0.022..0.022 rows=0 loops=2409) Index Cond: (testdb.testtable.id = testdb.testtable.id) Filter: (testdb.testtable.fts @@ '''verycommon'''::tsquery) Total runtime: 62.679 ms (14 rows) Time: 125.899 ms 2011-11-30 11:40:59.673 testdb=# explain analyze select id from testdb.testtable where fts @@ pptsquery('verycommon veryrare') order by id desc limit 300; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=2522.12..2522.87 rows=300 width=4) (actual time=1282.967..1282.972 rows=14 loops=1) -> Sort (cost=2522.12..2523.88 rows=704 width=4) (actual time=1282.965..1282.968 rows=14 loops=1) Sort Key: id Sort Method: quicksort Memory: 25kB -> Bitmap Heap Scan on testtable (cost=1081.67..2489.63 rows=704 width=4) (actual time=1282.902..1282.948 rows=14 loops=1) Recheck Cond: (fts @@ '''verycommon'' & ''veryrare'''::tsquery) -> Bitmap Index Scan on testtable_gin_idx (cost=0.00..1081.49 rows=704 width=0) (actual time=1282.880..1282.880 rows=17 loops=1) Index Cond: (fts @@ '''verycommon'' & ''veryrare'''::tsquery) Total runtime: 1283.274 ms (9 rows) Time: 1300.587 ms 2011-11-30 11:41:13.217 testdb=# This may of-course not always be the optimal query-plan, but in this situation the alternative plan is roughly 20-times better. (both queries cached). The in-clause is not "the best way", a regular filter on the results of the veryrare-term would be natural, but the query-planner cleverly collapses that to be the exact same thing. This can be stressed by just adding the same "common" keyword several time to the GIN-search, where the query, even producing the same results gets slower and slower: 2011-11-30 11:51:10.239 testdb=# select count(id) from testdb.testtable where id in (select id from testdb.testtable where fts @@ pptsquery('veryrare')) and fts @@ pptsquery('verycommon verycommon verycommon'); count ------- 14 (1 row) Time: 90.389 ms 2011-11-30 11:51:16.777 testdb=# select count(id) from testdb.testtable where fts @@ to_tsquery('english','veryrare & verycommon & verycommon & verycommon'); count ------- 14 (1 row) Time: 2207.125 ms PG does a lot of query-rewriting and testing different query-plans, there seems to be room for improvements here. I'll craft a test-dataset where it can be reproduced. Then someone with internal PG knowledge can tell me if it just isn't implemented or I might have some configuration option that prevents my system from doing it correctly. -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers