Greetings, I had a discussion on IRC today with RhodiumToad regarding optimizing a specific query. We didn't manage to figure out how to get postgres to hit a GIST index.
The bigger picture is that I am trying to do some bioinformatics and thought that postgres would be a great way of getting the 1000 genomes project "in the palm of my hand" instead of submitting C++ programs to a cluster, waiting for the results, etc. Anyway, there is a multicolumn index that we created: CREATE INDEX qcregions_chrregion_index ON qcregions USING gist(chr, region) WHERE type = 'include' - chr is an int between 1 and 24 (inclusive) - region is an int4range - type is an enum with two values, 'include' and 'exclude'. Here are three relevant explain outputs: Example 1: EXPLAIN SELECT * FROM (values (12,5000), (13,5001) ) v(c,r) WHERE EXISTS (SELECT region FROM qcregions WHERE type = 'include' and region @> v.r and chr = v.c); QUERY PLAN ------------------------------------------------------------------------------------------------------- Nested Loop Semi Join (cost=0.41..8.82 rows=1 width=8) -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=8) -> Index Scan using qcregions_chrregion_index on qcregions (cost=0.41..3464.26 rows=874 width=17) Index Cond: ((chr = "*VALUES*".column1) AND (region @> "*VALUES*".column2)) (4 rows) Time: 1.284 ms Example 2: -- set enable_setbitmapscan = true EXPLAIN SELECT * FROM (select * from vcf limit 2) AS vcf WHERE EXISTS (SELECT region FROM qcregions WHERE qcregions.chr = vcf.chr AND qcregions.type = 'include' AND vcf.pos <@ qcregions.region); QUERY PLAN ---------------------------------------------------------------------------------------------------------- Nested Loop Semi Join (cost=4862.57..18775.78 rows=1 width=64) -> Limit (cost=0.00..0.04 rows=2 width=64) -> Seq Scan on vcf (cost=0.00..1894654.40 rows=84801840 width=64) -> Bitmap Heap Scan on qcregions (cost=4862.57..7873.60 rows=874 width=17) Recheck Cond: ((vcf.pos <@ region) AND (type = 'include'::qcregiontype) AND (chr = vcf.chr)) -> BitmapAnd (cost=4862.57..4862.57 rows=874 width=0) -> Bitmap Index Scan on qcregions_chrregion_index (cost=0.00..977.76 rows=20980 width=0) Index Cond: (vcf.pos <@ region) -> Bitmap Index Scan on qcregions_chr_index (cost=0.00..3884.12 rows=215158 width=0) Index Cond: (chr = vcf.chr) (10 rows) Time: 0.708 ms Example 3 (same as example 2 but with enable_bitmapscan = false). -- set enable_bitmapscan = false EXPLAIN SELECT * FROM (select * from vcf limit 2) AS vcf WHERE EXISTS (SELECT region FROM qcregions WHERE qcregions.chr = vcf.chr AND qcregions.type = 'include' AND vcf.pos <@ qcregions.region); QUERY PLAN -------------------------------------------------------------------------------------------------- Nested Loop Semi Join (cost=0.43..38691.26 rows=1 width=64) -> Limit (cost=0.00..0.04 rows=2 width=64) -> Seq Scan on vcf (cost=0.00..1894654.40 rows=84801840 width=64) -> Index Scan using qcregions_chr_index on qcregions (cost=0.43..12891.38 rows=874 width=17) Index Cond: (chr = vcf.chr) Filter: ((type = 'include'::qcregiontype) AND (vcf.pos <@ region)) (6 rows) Time: 1.214 ms I am running psql (PostgreSQL) 9.4.4 on a laptop with 4 cores and 16 GB RAM. Looking forward to hearing your thoughts, Gideon.