Re: [HACKERS] can't coax query planner into using all columns of a gist index

2015-08-13 Thread Gideon Dresdner
That did it! I certainly should have been able to figure that out on my
own. Thanks for the help!

Unfortunately, I'm still looking at rather slow queries across my entire
dataset. I might wind up having to find another solution.

Gideon.

On Wed, Aug 12, 2015 at 6:29 PM Tom Lane t...@sss.pgh.pa.us wrote:

 Gideon Dresdner gide...@gmail.com writes:
  I've created a small dump of my database that recreates the problem. I
 hope
  that this will help recreate the problem. It is attached. I'd be happy to
  hear if there is an easier way of doing this.

 Ah.  Now that I see the database schema, the problem is here:

 regression=# \d vcf
 ...
  chr   | smallint |
 ...

 So chr is smallint in one table and integer in the other.  That means
 the parser translates qcregions.chr = vcf.chr using the int42eq operator
 instead of int4eq --- and nobody's ever taught btree_gist about crosstype
 operators.  So the clause simply isn't considered indexable with this
 index.  If you change the query to qcregions.chr = vcf.chr::int then
 all is well.

 Personally I'd just change vcf.chr to integer --- it's not even saving you
 any space, with that table schema, because the next column has to be
 int-aligned anyway.

 regards, tom lane



Re: [HACKERS] can't coax query planner into using all columns of a gist index

2015-08-12 Thread Gideon Dresdner
What's a good way for me to create a self-contained test case. AFAIU the
only way to make these test cases more self-contained would be to inline
the second table and its index. How do you create an index to an inlined
table of values?

Or perhaps I could send over a dump of a subset of the data?

Yes, I am fairly sure that I am running 9.4.4:

$ psql --version
psql (PostgreSQL) 9.4.4

# select version();
  version

---
 PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 5.1.0,
64-bit
(1 row)

Thanks for the help,
Gideon.

On Tue, Aug 11, 2015 at 10:23 PM Tom Lane t...@sss.pgh.pa.us wrote:

 Gideon Dresdner gide...@gmail.com writes:
  I had a discussion on IRC today with RhodiumToad regarding optimizing a
  specific query. We didn't manage to figure out how to get postgres to
 hit a
  GIST index.

 FWIW, I couldn't reproduce the described behavior.  Can you provide a
 self-contained test case?  Are you sure your server is 9.4.4?

 regards, tom lane



Re: [HACKERS] can't coax query planner into using all columns of a gist index

2015-08-12 Thread Tom Lane
Gideon Dresdner gide...@gmail.com writes:
 I've created a small dump of my database that recreates the problem. I hope
 that this will help recreate the problem. It is attached. I'd be happy to
 hear if there is an easier way of doing this.

Ah.  Now that I see the database schema, the problem is here:

regression=# \d vcf
...
 chr   | smallint | 
...

So chr is smallint in one table and integer in the other.  That means
the parser translates qcregions.chr = vcf.chr using the int42eq operator
instead of int4eq --- and nobody's ever taught btree_gist about crosstype
operators.  So the clause simply isn't considered indexable with this
index.  If you change the query to qcregions.chr = vcf.chr::int then
all is well.

Personally I'd just change vcf.chr to integer --- it's not even saving you
any space, with that table schema, because the next column has to be
int-aligned anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] can't coax query planner into using all columns of a gist index

2015-08-12 Thread Gideon Dresdner
I've created a small dump of my database that recreates the problem. I hope
that this will help recreate the problem. It is attached. I'd be happy to
hear if there is an easier way of doing this.

To rebuild the database:
- create a database
- run from the commandline `$ psql database-name  1000genomes-schema.sql`
- run this within a psql REPL  ` #\copy public.qcregions FROM
'/tmp/1000genomes-qcregions.tsv' DELIMITER '  ' CSV;` (where the
delimiter is a tab)
- similarly run this within a psql REPL,  `#\copy public.vcf FROM
'/tmp/1000genomes-vcf.tsv' DELIMITER '  ' CSV;`

To see that the GIST index is not being hit, try running the following
query:

EXPLAIN SELECT * FROM vcf
WHERE EXISTS (SELECT region FROM qcregions
WHERE qcregions.chr = vcf.chr
  AND vcf.pos @ qcregions.region);

The actual query I am trying to run is:

EXPLAIN SELECT * FROM vcf
WHERE EXISTS (SELECT region FROM qcregions
WHERE qcregions.chr = vcf.chr
  AND qcregions.type = 'include'
  AND vcf.pos @ qcregions.region);

Let me know what else I can try,
Gideon.

On Wed, Aug 12, 2015 at 11:07 AM Gideon Dresdner gide...@gmail.com wrote:

 What's a good way for me to create a self-contained test case. AFAIU the
 only way to make these test cases more self-contained would be to inline
 the second table and its index. How do you create an index to an inlined
 table of values?

 Or perhaps I could send over a dump of a subset of the data?

 Yes, I am fairly sure that I am running 9.4.4:

 $ psql --version
 psql (PostgreSQL) 9.4.4

 # select version();
   version


 ---
  PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
 5.1.0, 64-bit
 (1 row)

 Thanks for the help,
 Gideon.

 On Tue, Aug 11, 2015 at 10:23 PM Tom Lane t...@sss.pgh.pa.us wrote:

 Gideon Dresdner gide...@gmail.com writes:
  I had a discussion on IRC today with RhodiumToad regarding optimizing a
  specific query. We didn't manage to figure out how to get postgres to
 hit a
  GIST index.

 FWIW, I couldn't reproduce the described behavior.  Can you provide a
 self-contained test case?  Are you sure your server is 9.4.4?

 regards, tom lane




1000genomes-dump.tar.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] can't coax query planner into using all columns of a gist index

2015-08-11 Thread Gideon Dresdner
Greetings,

I had a discussion on IRC today with RhodiumToad regarding optimizing a
specific query. We didn't manage to figure out how to get postgres to hit a
GIST index.

The bigger picture is that I am trying to do some bioinformatics and
thought that postgres would be a great way of getting the 1000 genomes
project in the palm of my hand instead of submitting C++ programs to a
cluster, waiting for the results, etc.

Anyway, there is a multicolumn index that we created:

CREATE INDEX qcregions_chrregion_index ON qcregions USING gist(chr, region)
WHERE type = 'include'

- chr is an int between 1 and 24 (inclusive)
- region is an int4range
- type is an enum with two values, 'include' and 'exclude'.

Here are three relevant explain outputs:

Example 1:

EXPLAIN SELECT * FROM (values (12,5000), (13,5001) ) v(c,r)
WHERE EXISTS (SELECT region FROM qcregions
WHERE type = 'include' and region @ v.r and chr = v.c);
  QUERY PLAN

---
 Nested Loop Semi Join  (cost=0.41..8.82 rows=1 width=8)
   -  Values Scan on *VALUES*  (cost=0.00..0.03 rows=2 width=8)
   -  Index Scan using qcregions_chrregion_index on qcregions
 (cost=0.41..3464.26 rows=874 width=17)
 Index Cond: ((chr = *VALUES*.column1) AND (region @
*VALUES*.column2))
(4 rows)

Time: 1.284 ms

Example 2:

-- set enable_setbitmapscan = true
EXPLAIN SELECT * FROM (select * from vcf limit 2) AS vcf
WHERE EXISTS (SELECT region FROM qcregions
WHERE qcregions.chr = vcf.chr
  AND qcregions.type = 'include'
  AND vcf.pos @ qcregions.region);
QUERY PLAN

--
 Nested Loop Semi Join  (cost=4862.57..18775.78 rows=1 width=64)
   -  Limit  (cost=0.00..0.04 rows=2 width=64)
 -  Seq Scan on vcf  (cost=0.00..1894654.40 rows=84801840 width=64)
   -  Bitmap Heap Scan on qcregions  (cost=4862.57..7873.60 rows=874
width=17)
 Recheck Cond: ((vcf.pos @ region) AND (type =
'include'::qcregiontype) AND (chr = vcf.chr))
 -  BitmapAnd  (cost=4862.57..4862.57 rows=874 width=0)
   -  Bitmap Index Scan on qcregions_chrregion_index
 (cost=0.00..977.76 rows=20980 width=0)
 Index Cond: (vcf.pos @ region)
   -  Bitmap Index Scan on qcregions_chr_index
 (cost=0.00..3884.12 rows=215158 width=0)
 Index Cond: (chr = vcf.chr)
(10 rows)

Time: 0.708 ms

Example 3 (same as example 2 but with enable_bitmapscan = false).

-- set enable_bitmapscan = false
 EXPLAIN SELECT * FROM (select * from vcf limit 2) AS vcf
WHERE EXISTS (SELECT region FROM qcregions
WHERE qcregions.chr = vcf.chr
  AND qcregions.type = 'include'
  AND vcf.pos @ qcregions.region);
QUERY PLAN

--
 Nested Loop Semi Join  (cost=0.43..38691.26 rows=1 width=64)
   -  Limit  (cost=0.00..0.04 rows=2 width=64)
 -  Seq Scan on vcf  (cost=0.00..1894654.40 rows=84801840 width=64)
   -  Index Scan using qcregions_chr_index on qcregions
 (cost=0.43..12891.38 rows=874 width=17)
 Index Cond: (chr = vcf.chr)
 Filter: ((type = 'include'::qcregiontype) AND (vcf.pos @ region))
(6 rows)

Time: 1.214 ms

I am running psql (PostgreSQL) 9.4.4 on a laptop with 4 cores and 16 GB RAM.

Looking forward to hearing your thoughts,
Gideon.


Re: [HACKERS] can't coax query planner into using all columns of a gist index

2015-08-11 Thread Tom Lane
Gideon Dresdner gide...@gmail.com writes:
 I had a discussion on IRC today with RhodiumToad regarding optimizing a
 specific query. We didn't manage to figure out how to get postgres to hit a
 GIST index.

FWIW, I couldn't reproduce the described behavior.  Can you provide a
self-contained test case?  Are you sure your server is 9.4.4?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers