Re: [HACKERS] can't coax query planner into using all columns of a gist index
That did it! I certainly should have been able to figure that out on my own. Thanks for the help! Unfortunately, I'm still looking at rather slow queries across my entire dataset. I might wind up having to find another solution. Gideon. On Wed, Aug 12, 2015 at 6:29 PM Tom Lane t...@sss.pgh.pa.us wrote: Gideon Dresdner gide...@gmail.com writes: I've created a small dump of my database that recreates the problem. I hope that this will help recreate the problem. It is attached. I'd be happy to hear if there is an easier way of doing this. Ah. Now that I see the database schema, the problem is here: regression=# \d vcf ... chr | smallint | ... So chr is smallint in one table and integer in the other. That means the parser translates qcregions.chr = vcf.chr using the int42eq operator instead of int4eq --- and nobody's ever taught btree_gist about crosstype operators. So the clause simply isn't considered indexable with this index. If you change the query to qcregions.chr = vcf.chr::int then all is well. Personally I'd just change vcf.chr to integer --- it's not even saving you any space, with that table schema, because the next column has to be int-aligned anyway. regards, tom lane
Re: [HACKERS] can't coax query planner into using all columns of a gist index
What's a good way for me to create a self-contained test case. AFAIU the only way to make these test cases more self-contained would be to inline the second table and its index. How do you create an index to an inlined table of values? Or perhaps I could send over a dump of a subset of the data? Yes, I am fairly sure that I am running 9.4.4: $ psql --version psql (PostgreSQL) 9.4.4 # select version(); version --- PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 5.1.0, 64-bit (1 row) Thanks for the help, Gideon. On Tue, Aug 11, 2015 at 10:23 PM Tom Lane t...@sss.pgh.pa.us wrote: Gideon Dresdner gide...@gmail.com writes: I had a discussion on IRC today with RhodiumToad regarding optimizing a specific query. We didn't manage to figure out how to get postgres to hit a GIST index. FWIW, I couldn't reproduce the described behavior. Can you provide a self-contained test case? Are you sure your server is 9.4.4? regards, tom lane
Re: [HACKERS] can't coax query planner into using all columns of a gist index
Gideon Dresdner gide...@gmail.com writes: I've created a small dump of my database that recreates the problem. I hope that this will help recreate the problem. It is attached. I'd be happy to hear if there is an easier way of doing this. Ah. Now that I see the database schema, the problem is here: regression=# \d vcf ... chr | smallint | ... So chr is smallint in one table and integer in the other. That means the parser translates qcregions.chr = vcf.chr using the int42eq operator instead of int4eq --- and nobody's ever taught btree_gist about crosstype operators. So the clause simply isn't considered indexable with this index. If you change the query to qcregions.chr = vcf.chr::int then all is well. Personally I'd just change vcf.chr to integer --- it's not even saving you any space, with that table schema, because the next column has to be int-aligned anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] can't coax query planner into using all columns of a gist index
I've created a small dump of my database that recreates the problem. I hope that this will help recreate the problem. It is attached. I'd be happy to hear if there is an easier way of doing this. To rebuild the database: - create a database - run from the commandline `$ psql database-name 1000genomes-schema.sql` - run this within a psql REPL ` #\copy public.qcregions FROM '/tmp/1000genomes-qcregions.tsv' DELIMITER ' ' CSV;` (where the delimiter is a tab) - similarly run this within a psql REPL, `#\copy public.vcf FROM '/tmp/1000genomes-vcf.tsv' DELIMITER ' ' CSV;` To see that the GIST index is not being hit, try running the following query: EXPLAIN SELECT * FROM vcf WHERE EXISTS (SELECT region FROM qcregions WHERE qcregions.chr = vcf.chr AND vcf.pos @ qcregions.region); The actual query I am trying to run is: EXPLAIN SELECT * FROM vcf WHERE EXISTS (SELECT region FROM qcregions WHERE qcregions.chr = vcf.chr AND qcregions.type = 'include' AND vcf.pos @ qcregions.region); Let me know what else I can try, Gideon. On Wed, Aug 12, 2015 at 11:07 AM Gideon Dresdner gide...@gmail.com wrote: What's a good way for me to create a self-contained test case. AFAIU the only way to make these test cases more self-contained would be to inline the second table and its index. How do you create an index to an inlined table of values? Or perhaps I could send over a dump of a subset of the data? Yes, I am fairly sure that I am running 9.4.4: $ psql --version psql (PostgreSQL) 9.4.4 # select version(); version --- PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 5.1.0, 64-bit (1 row) Thanks for the help, Gideon. On Tue, Aug 11, 2015 at 10:23 PM Tom Lane t...@sss.pgh.pa.us wrote: Gideon Dresdner gide...@gmail.com writes: I had a discussion on IRC today with RhodiumToad regarding optimizing a specific query. We didn't manage to figure out how to get postgres to hit a GIST index. FWIW, I couldn't reproduce the described behavior. Can you provide a self-contained test case? Are you sure your server is 9.4.4? regards, tom lane 1000genomes-dump.tar.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] can't coax query planner into using all columns of a gist index
Greetings, I had a discussion on IRC today with RhodiumToad regarding optimizing a specific query. We didn't manage to figure out how to get postgres to hit a GIST index. The bigger picture is that I am trying to do some bioinformatics and thought that postgres would be a great way of getting the 1000 genomes project in the palm of my hand instead of submitting C++ programs to a cluster, waiting for the results, etc. Anyway, there is a multicolumn index that we created: CREATE INDEX qcregions_chrregion_index ON qcregions USING gist(chr, region) WHERE type = 'include' - chr is an int between 1 and 24 (inclusive) - region is an int4range - type is an enum with two values, 'include' and 'exclude'. Here are three relevant explain outputs: Example 1: EXPLAIN SELECT * FROM (values (12,5000), (13,5001) ) v(c,r) WHERE EXISTS (SELECT region FROM qcregions WHERE type = 'include' and region @ v.r and chr = v.c); QUERY PLAN --- Nested Loop Semi Join (cost=0.41..8.82 rows=1 width=8) - Values Scan on *VALUES* (cost=0.00..0.03 rows=2 width=8) - Index Scan using qcregions_chrregion_index on qcregions (cost=0.41..3464.26 rows=874 width=17) Index Cond: ((chr = *VALUES*.column1) AND (region @ *VALUES*.column2)) (4 rows) Time: 1.284 ms Example 2: -- set enable_setbitmapscan = true EXPLAIN SELECT * FROM (select * from vcf limit 2) AS vcf WHERE EXISTS (SELECT region FROM qcregions WHERE qcregions.chr = vcf.chr AND qcregions.type = 'include' AND vcf.pos @ qcregions.region); QUERY PLAN -- Nested Loop Semi Join (cost=4862.57..18775.78 rows=1 width=64) - Limit (cost=0.00..0.04 rows=2 width=64) - Seq Scan on vcf (cost=0.00..1894654.40 rows=84801840 width=64) - Bitmap Heap Scan on qcregions (cost=4862.57..7873.60 rows=874 width=17) Recheck Cond: ((vcf.pos @ region) AND (type = 'include'::qcregiontype) AND (chr = vcf.chr)) - BitmapAnd (cost=4862.57..4862.57 rows=874 width=0) - Bitmap Index Scan on qcregions_chrregion_index (cost=0.00..977.76 rows=20980 width=0) Index Cond: (vcf.pos @ region) - Bitmap Index Scan on qcregions_chr_index (cost=0.00..3884.12 rows=215158 width=0) Index Cond: (chr = vcf.chr) (10 rows) Time: 0.708 ms Example 3 (same as example 2 but with enable_bitmapscan = false). -- set enable_bitmapscan = false EXPLAIN SELECT * FROM (select * from vcf limit 2) AS vcf WHERE EXISTS (SELECT region FROM qcregions WHERE qcregions.chr = vcf.chr AND qcregions.type = 'include' AND vcf.pos @ qcregions.region); QUERY PLAN -- Nested Loop Semi Join (cost=0.43..38691.26 rows=1 width=64) - Limit (cost=0.00..0.04 rows=2 width=64) - Seq Scan on vcf (cost=0.00..1894654.40 rows=84801840 width=64) - Index Scan using qcregions_chr_index on qcregions (cost=0.43..12891.38 rows=874 width=17) Index Cond: (chr = vcf.chr) Filter: ((type = 'include'::qcregiontype) AND (vcf.pos @ region)) (6 rows) Time: 1.214 ms I am running psql (PostgreSQL) 9.4.4 on a laptop with 4 cores and 16 GB RAM. Looking forward to hearing your thoughts, Gideon.
Re: [HACKERS] can't coax query planner into using all columns of a gist index
Gideon Dresdner gide...@gmail.com writes: I had a discussion on IRC today with RhodiumToad regarding optimizing a specific query. We didn't manage to figure out how to get postgres to hit a GIST index. FWIW, I couldn't reproduce the described behavior. Can you provide a self-contained test case? Are you sure your server is 9.4.4? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers