I am currently working on optimizing some fairly time consuming queries on a decently large
dataset.


The Following is the query in question.

SELECT z.lat, z.lon, z.city, z.state, q.date_time, c.make, c.model, c.year
   FROM quotes AS q, zips AS z, cars AS c
       WHERE
           z.zip = q.zip AND
           c.car_id = q.car_id AND
           z.state != 'AA' AND
           z.state != 'AE' AND
           z.state != 'AP' AND
           z.state = 'WA'
    ORDER BY date_time;

The tables are as follows.

Table "public.cars"
Column | Type | Modifiers
---------------+-----------------------+----------------------------------------
car_id | character varying(10) | not null default ''::character varying
nags_glass_id | character varying(7) | not null default ''::character varying
make | character varying(30) | not null default ''::character varying
model | character varying(30) | not null default ''::character varying
year | character varying(4) | not null default ''::character varying
style | character varying(30) | not null default ''::character varying
price | double precision | not null default (0)::double precision
Indexes:
"cars_pkey" primary key, btree (car_id)
"cars_car_id_btree_index" btree (car_id)
"make_cars_index" btree (make)
"model_cars_index" btree (model)
"year_cars_index" btree ("year")


Table "public.quotes"
Column | Type | Modifiers
-------------------+-----------------------------+---------------------------------------------------------------------
quote_id | bigint | not null default nextval('quotes_quote_id_seq'::text)
visitor_id | bigint | not null default (0)::bigint
date_time | timestamp without time zone | not null default '0001-01-01 00:00:00'::timestamp without time zone
car_id | character varying(10) | not null default ''::character varying
email | text | not null default ''::text
zip | character varying(5) | not null default ''::character varying
current_referrer | text | not null default ''::text
original_referrer | text | not null default ''::text
Indexes:
"quotes_pkey" primary key, btree (quote_id)
"car_id_quotes_index" btree (car_id)
"visitor_id_quotes_index" btree (visitor_id)
"zip_quotes_index" btree (zip)


Table "public.zips"
Column | Type | Modifiers
--------+-----------------------+---------------------------------------------------
zip_id | bigint | not null default nextval('zips_zip_id_seq'::text)
zip | character varying(5) | not null default ''::character varying
city | character varying(28) | not null default ''::character varying
state | character varying(2) | not null default ''::character varying
lat | character varying(10) | not null default ''::character varying
lon | character varying(10) | not null default ''::character varying
Indexes:
"zips_pkey" primary key, btree (zip_id)
"zip_zips_index" btree (zip)
"zips_state_btree_index" btree (state)


The above query with the default setting of 10 for default_statistics_target runs as follows

(From Explain Analyze)

---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=58064.16..58074.20 rows=4015 width=80) (actual time=2415.060..2421.421 rows=4539 loops=1)
Sort Key: q.date_time
-> Merge Join (cost=57728.02..57823.84 rows=4015 width=80) (actual time=2254.056..2345.013 rows=4539 loops=1)
Merge Cond: ("outer"."?column7?" = "inner"."?column5?")
-> Sort (cost=56880.61..56890.65 rows=4015 width=62) (actual time=2054.353..2062.189 rows=4693 loops=1)
Sort Key: (q.car_id)::text
-> Hash Join (cost=1403.91..56640.29 rows=4015 width=62) (actual time=8.479..1757.126 rows=10151 loops=1)
Hash Cond: (("outer".zip)::text = ("inner".zip)::text)
-> Seq Scan on quotes q (cost=0.00..10657.42 rows=336142 width=27) (actual time=0.062..657.015 rows=336166 loops=1)
-> Hash (cost=1402.63..1402.63 rows=511 width=52) (actual time=8.273..8.273 rows=0 loops=1)
-> Index Scan using zips_state_btree_index on zips z (cost=0.00..1402.63 rows=511 width=52) (actual time=0.215..6.877 rows=718 loops=1)
Index Cond: ((state)::text = 'WA'::text)
Filter: (((state)::text <> 'AA'::text) AND ((state)::text <> 'AE'::text) AND ((state)::text <> 'AP'::text))
-> Sort (cost=847.41..870.91 rows=9401 width=37) (actual time=199.172..216.354 rows=11922 loops=1)
Sort Key: (c.car_id)::text
-> Seq Scan on cars c (cost=0.00..227.01 rows=9401 width=37) (actual time=0.104..43.523 rows=9401 loops=1)
Total runtime: 2427.937 ms


If I set enable_seqscan=off I get the following
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=122108.52..122118.62 rows=4039 width=80) (actual time=701.002..707.442 rows=4541 loops=1)
Sort Key: q.date_time
-> Nested Loop (cost=0.00..121866.59 rows=4039 width=80) (actual time=0.648..624.134 rows=4541 loops=1)
-> Nested Loop (cost=0.00..102256.36 rows=4039 width=62) (actual time=0.374..381.440 rows=10153 loops=1)
-> Index Scan using zips_state_btree_index on zips z (cost=0.00..1413.31 rows=514 width=52) (actual time=0.042..9.043 rows=718 loops=1)
Index Cond: ((state)::text = 'WA'::text)
Filter: (((state)::text <> 'AA'::text) AND ((state)::text <> 'AE'::text) AND ((state)::text <> 'AP'::text))
-> Index Scan using zip_quotes_index on quotes q (cost=0.00..195.59 rows=48 width=27) (actual time=0.039..0.426 rows=14 loops=718)
Index Cond: (("outer".zip)::text = (q.zip)::text)
-> Index Scan using cars_car_id_btree_index on cars c (cost=0.00..4.84 rows=1 width=37) (actual time=0.015..0.017 rows=0 loops=10153)
Index Cond: ((c.car_id)::text = ("outer".car_id)::text)
Total runtime: 711.375 ms


I can also get a similar plan if I disable both Hash Joins and Merge Joins.

Furthermore I can get some additional speedup without turning off sequence scans if I
set the value of default_statistics_target = 1000 then the runtime will be around 1200
otoh if I set default_statistics_target = 100 then the runtime will be around 12000.


So, my question is is there any way to get the query planner to recognize the potential
performance increase available by using the indexes that are set up without specifically
turning off sequential scans before I run this query every time?


Thanks for the help.

Jared









---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Reply via email to