Re: [PERFORM] A question on the query planner
Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Define no longer works well. Well it seems to completely bar the use of a straight merge join between two index scans: Hmmm ... [squints] ... it's not supposed to do that ... [digs] ... yeah, there's something busted here. Will get back to you ... LOL, but I am not sure why. :-) -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] A question on the query planner
On Mon, 2003-12-01 at 16:44, Jared Carr wrote: I am currently working on optimizing some fairly time consuming queries on a decently large dataset. The Following is the query in question. SELECT z.lat, z.lon, z.city, z.state, q.date_time, c.make, c.model, c.year FROM quotes AS q, zips AS z, cars AS c WHERE z.zip = q.zip AND c.car_id = q.car_id AND z.state != 'AA' AND z.state != 'AE' AND z.state != 'AP' AND z.state = 'WA' ORDER BY date_time; This wont completely solve your problem, but z.state = 'WA' would seem to be mutually exclusive of the != AA|AE|AP. While it's not much, it is extra overhead there doesn't seem to be any need for... Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] A question on the query planner
Jared Carr [EMAIL PROTECTED] writes: Furthermore noticed that in the following query plan it is doing the sequential scan on quotes first, and then doing the sequential on zips. IMHO this should be the other way around, since the result set for zips is considerably smaller especially give that we are using a where clause to limit the number of items returned from zips, so it would seem that it would be faster to scan zips then join onto quotes, but perhaps it needs to do the sequential scan on both regardless. - Hash Join (cost=1088.19..56382.58 rows=4058 width=62) (actual time=86.111..1834.682 rows=10193 loops=1) Hash Cond: ((outer.zip)::text = (inner.zip)::text) - Seq Scan on quotes q (cost=0.00..10664.25 rows=336525 width=27) (actual time=0.098..658.905 rows=336963 loops=1) - Hash (cost=1086.90..1086.90 rows=516 width=52) (actual time=85.798..85.798 rows=0 loops=1) - Seq Scan on zips z (cost=0.00..1086.90 rows=516 width=52) (actual time=79.532..84.151 rows=718 loops=1) Filter: ((state)::text = 'WA'::text) You're misreading it. Hash join is done by reading in one table into a hash table, then reading the other table looking up entries in the hash table. The zips are being read into the hash table which is appropriate if it's the smaller table. Of course still there is the holy grail of getting it to actually use the indexes. :P Merge Cond: (outer.?column7? = inner.?column5?) Well it looks like you have something strange going on. What data type is car_id in each table? -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] A question on the query planner
Greg Stark wrote: Merge Cond: (outer.?column7? = inner.?column5?) Well it looks like you have something strange going on. What data type is car_id in each table? car_id is a varchar(10) in both tables. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] A question on the query planner
Jared Carr [EMAIL PROTECTED] writes: Greg Stark wrote: Merge Cond: (outer.?column7? = inner.?column5?) Well it looks like you have something strange going on. What data type is car_id in each table? car_id is a varchar(10) in both tables. Well for some reason it's being cast to a text to do the merge. What version of postgres is this btw? The analyzes look like 7.4? -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] A question on the query planner
Greg Stark wrote: Jared Carr [EMAIL PROTECTED] writes: Greg Stark wrote: Merge Cond: (outer.?column7? = inner.?column5?) Well it looks like you have something strange going on. What data type is car_id in each table? car_id is a varchar(10) in both tables. Well for some reason it's being cast to a text to do the merge. What version of postgres is this btw? The analyzes look like 7.4? Yes, this is 7.4. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] A question on the query planner
Jared Carr [EMAIL PROTECTED] writes: Greg Stark wrote: Well it looks like you have something strange going on. What data type is car_id in each table? car_id is a varchar(10) in both tables. Huh. The following shows something strange. It seems joining on two varchars no longer works well. Instead the optimizer has to convert both columns to text. I know some inter-type comparisons were removed a while ago, but I would not have thought that would effect varchar-varchar comparisons. I think this is pretty bad. test=# create table a (x varchar primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index a_pkey for table a CREATE TABLE test=# create table b (x varchar primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index b_pkey for table b CREATE TABLE test=# select * from a,b where a.x=b.x; x | x ---+--- (0 rows) test=# explain select * from a,b where a.x=b.x; QUERY PLAN -- Merge Join (cost=139.66..159.67 rows=1001 width=64) Merge Cond: (outer.?column2? = inner.?column2?) - Sort (cost=69.83..72.33 rows=1000 width=32) Sort Key: (a.x)::text - Seq Scan on a (cost=0.00..20.00 rows=1000 width=32) - Sort (cost=69.83..72.33 rows=1000 width=32) Sort Key: (b.x)::text - Seq Scan on b (cost=0.00..20.00 rows=1000 width=32) (8 rows) test=# create table a2 (x text primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index a2_pkey for table a2 CREATE TABLE test=# create table b2 (x text primary key); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index b2_pkey for table b2 CREATE TABLE test=# explain select * from a2,b2 where a2.x=b2.x; QUERY PLAN --- Hash Join (cost=22.50..57.51 rows=1001 width=64) Hash Cond: (outer.x = inner.x) - Seq Scan on a2 (cost=0.00..20.00 rows=1000 width=32) - Hash (cost=20.00..20.00 rows=1000 width=32) - Seq Scan on b2 (cost=0.00..20.00 rows=1000 width=32) (5 rows) -- greg ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] A question on the query planner
Greg Stark [EMAIL PROTECTED] writes: Huh. The following shows something strange. Worse, with enable_hashjoin off it's even more obvious something's broken: test=# set enable_hashjoin = off; SET test=# explain select * from a,b where a.x=b.x; QUERY PLAN -- Merge Join (cost=139.66..159.67 rows=1001 width=64) Merge Cond: (outer.?column2? = inner.?column2?) - Sort (cost=69.83..72.33 rows=1000 width=32) Sort Key: (a.x)::text - Seq Scan on a (cost=0.00..20.00 rows=1000 width=32) - Sort (cost=69.83..72.33 rows=1000 width=32) Sort Key: (b.x)::text - Seq Scan on b (cost=0.00..20.00 rows=1000 width=32) (8 rows) test=# explain select * from a2,b2 where a2.x=b2.x; QUERY PLAN - Merge Join (cost=0.00..63.04 rows=1001 width=64) Merge Cond: (outer.x = inner.x) - Index Scan using a2_pkey on a2 (cost=0.00..24.00 rows=1000 width=32) - Index Scan using b2_pkey on b2 (cost=0.00..24.00 rows=1000 width=32) (4 rows) -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] A question on the query planner
Greg Stark [EMAIL PROTECTED] writes: Huh. The following shows something strange. It seems joining on two varchars no longer works well. Instead the optimizer has to convert both columns to text. Define no longer works well. varchar doesn't have its own comparison operators anymore, but AFAIK that makes no difference. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] A question on the query planner
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Define no longer works well. Well it seems to completely bar the use of a straight merge join between two index scans: Hmmm ... [squints] ... it's not supposed to do that ... [digs] ... yeah, there's something busted here. Will get back to you ... regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] A question on the query planner
Jared Carr [EMAIL PROTECTED] writes: I am currently working on optimizing some fairly time consuming queries on a decently large dataset. It doesn't look that large from here ;-). I'd suggest experimenting with reducing random_page_cost, since at least for your test query it sure looks like everything is in RAM. In theory random_page_cost = 1.0 is the correct setting for all-in-RAM cases. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend