[PERFORM] performance change from 8.3.1 to later releases
vc2 on vc2.short_desc_75 = vc1.short_desc_75 and vc2.th_sku != vc1.th_sku where vc1.cutoff_date is null and vc2.cutoff_date is null and vc1.th_sku is not null group by vc1.th_sku ) and vc.th_sku not in (select vc_th_Sku from traderhank.xc_products where vc_th_sku is not null) Seq Scan on vendor_catalog vc (cost=140413.05..91527264.28 rows=36838 width=309) Filter: ((NOT (hashed SubPlan 2)) AND (NOT (SubPlan 1))) SubPlan 2 - Seq Scan on xc_products (cost=0.00..1716.99 rows=5132 width=8) Filter: (vc_th_sku IS NOT NULL) SubPlan 1 - Materialize (cost=138683.23..139734.64 rows=75541 width=8) - Group (cost=134997.43..138311.69 rows=75541 width=8) - Sort (cost=134997.43..136654.56 rows=662853 width=8) Sort Key: vc1.th_sku - Merge Join (cost=39600.73..52775.08 rows=662853 width=8) Merge Cond: ((vc1.short_desc_75)::text = (vc2.short_desc_75)::text) Join Filter: ((vc2.th_sku)::text (vc1.th_sku)::text) - Sort (cost=19800.37..20062.75 rows=104954 width=27) Sort Key: vc1.short_desc_75 - Seq Scan on vendor_catalog vc1 (cost=0.00..8534.52 rows=104954 width=27) Filter: ((cutoff_date IS NULL) AND (th_sku IS NOT NULL)) - Materialize (cost=19800.37..21112.29 rows=104954 width=27) - Sort (cost=19800.37..20062.75 rows=104954 width=27) Sort Key: vc2.short_desc_75 - Seq Scan on vendor_catalog vc2 (cost=0.00..8534.52 rows=104954 width=27) Filter: (cutoff_date IS NULL) I've also tried changing the code to not exists, but that query never comes back on any version I have available: explain --analyze select vc.* from traderhank.vendor_catalog vc where not exists ( select 1 from traderhank.vendor_catalog vc1 join traderhank.vendor_catalog vc2 on vc2.short_desc_75 = vc1.short_desc_75 and vc2.th_sku != vc1.th_sku where vc1.cutoff_date is null and vc2.cutoff_date is null and vc1.th_sku = vc.th_sku group by vc1.th_sku ) and not exists (select 1 from traderhank.xc_products where vc_th_sku is not null and vc_th_sku = vc.th_sku) Nested Loop Anti Join (cost=63650.74..93617.53 rows=1 width=309) Join Filter: ((xc_products.vc_th_sku)::text = (vc.th_sku)::text) - Hash Anti Join (cost=63650.74..91836.39 rows=1 width=309) Hash Cond: ((vc.th_sku)::text = (vc1.th_sku)::text) - Seq Scan on vendor_catalog vc (cost=0.00..8534.52 rows=147352 width=309) - Hash (cost=52775.08..52775.08 rows=662853 width=8) - Merge Join (cost=39600.73..52775.08 rows=662853 width=8) Merge Cond: ((vc1.short_desc_75)::text = (vc2.short_desc_75)::text) Join Filter: ((vc2.th_sku)::text (vc1.th_sku)::text) - Sort (cost=19800.37..20062.75 rows=104954 width=27) Sort Key: vc1.short_desc_75 - Seq Scan on vendor_catalog vc1 (cost=0.00..8534.52 rows=104954 width=27) Filter: (cutoff_date IS NULL) - Materialize (cost=19800.37..21112.29 rows=104954 width=27) - Sort (cost=19800.37..20062.75 rows=104954 width=27) Sort Key: vc2.short_desc_75 - Seq Scan on vendor_catalog vc2 (cost=0.00..8534.52 rows=104954 width=27) Filter: (cutoff_date IS NULL) - Seq Scan on xc_products (cost=0.00..1716.99 rows=5132 width=8) Filter: (xc_products.vc_th_sku IS NOT NULL) So, my question is, do I need to re-write all of my in() and not in () queries to left joins, is this something that might get resolved in another release in the future? Thanks for any help. Roger Ging -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Followup - expression (functional) index use in joins
Turning enable_hashjoin off made the query run as it had on v7.3. We have worked around this by changing the index from a function call to a direct index on a new column with the results of the function maintained by a trigger. Would there be performance issues from leaving enable_hashjoin off, or do you recomend enabling it, and working around function calls in indices? See results below. ppl=# explain analyse select title from music.program p ppl-# join music.logfile l on ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no) ppl-# where l.air_date = '01/30/2001' ppl-# and l.station = 'KABC'; QUERY PLAN - Hash Join (cost=69.89..19157.06 rows=2322 width=28) (actual time=500.905..1473.748 rows=242 loops=1) Hash Cond: ((music.fn_mri_id_no_program("outer".mri_id_no))::text = ("inner".program_id)::text) - Seq Scan on program p (cost=0.00..16888.98 rows=173998 width=40) (actual time=98.371..532.184 rows=173998 loops=1) - Hash (cost=69.84..69.84 rows=17 width=9) (actual time=65.817..65.817 rows=0 loops=1) - Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..69.84 rows=17 width=9) (actual time=24.499..65.730 rows=32 loops=1) Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2001-01-30 00:00:00'::timestamp without time zone)) Total runtime: 1474.067 ms (7 rows) ppl=# set enable_mergejoin = false; SET ppl=# explain analyse select title from music.program p ppl-# join music.logfile l on ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no) ppl-# where l.air_date = '01/30/2001' ppl-# and l.station = 'KABC'; QUERY PLAN --- Hash Join (cost=69.89..19157.06 rows=2322 width=28) (actual time=444.834..1428.815 rows=242 loops=1) Hash Cond: ((music.fn_mri_id_no_program("outer".mri_id_no))::text = ("inner".program_id)::text) - Seq Scan on program p (cost=0.00..16888.98 rows=173998 width=40) (actual time=105.977..542.870 rows=173998 loops=1) - Hash (cost=69.84..69.84 rows=17 width=9) (actual time=1.197..1.197 rows=0 loops=1) - Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..69.84 rows=17 width=9) (actual time=0.574..1.151 rows=32 loops=1) Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2001-01-30 00:00:00'::timestamp without time zone)) Total runtime: 1429.111 ms (7 rows) ppl=# set enable_hashjoin = false; SET ppl=# explain analyse select title from music.program p ppl-# join music.logfile l on ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no) ppl-# where l.air_date = '01/30/2001' ppl-# and l.station = 'KABC'; QUERY PLAN -- Nested Loop (cost=0.00..58104.34 rows=2322 width=28) (actual time=0.480..5.357 rows=242 loops=1) - Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..69.84 rows=17 width=9) (actual time=0.176..0.754 rows=32 loops=1) Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2001-01-30 00:00:00'::timestamp without time zone)) - Index Scan using idx_program_mri_id_no_program on program p (cost=0.00..3400.74 rows=870 width=40) (actual time=0.041..0.127 rows=8 loops=32) Index Cond: (("outer".program_id)::text = (music.fn_mri_id_no_program(p.mri_id_no))::text) Total runtime: 5.637 ms (6 rows) Tom Lane wrote: Roger Ging [EMAIL PROTECTED] writes: Ran vacuum analyse on both program and logfile tables. Estimates are more in line with reality now, And they are what now? You really can't expect to get useful help here when you're being so miserly with the details ... FWIW, I suspect you could force 7.4 to generate 7.3's plan by setting enable_mergejoin to off (might have to also set enable_hashjoin to off, if it then tries for a hash join). 7.3 could not even consider those join types in this example, while 7.4 can. The interesting question from my perspective is why the planner is guessing wrong about the relative costs of the plans. EXPLAIN ANALYZE results with each type of join forced would be useful to look at. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html Tom Lane wrote: Roger Ging [EMAIL PROTECTED] writes: Ran vacuum analyse on both program and logfile tables. Estimates are more in line with reality now, And they are what now? You really can't expect to get useful help here when you're being so miserly with the details ... FWIW, I suspect you could force 7.4
Re: [PERFORM] Followup - expression (functional) index use in joins
Tom, Turning enable_hashjoin off made the query run as it had on v7.3. We have worked around this by changing the index from a function call to a direct index on a new column with the results of the function maintained by a trigger. Would there be performance issues from leaving enable_hashjoin off, or do you recomend enabling it, and working around function calls in indices? See results below. I was not sure if I was supposed to reply-all, or just to the list. Sorry if the protocol is incorrect. ppl=# explain analyse select title from music.program p ppl-# join music.logfile l on ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no) ppl-# where l.air_date = '01/30/2001' ppl-# and l.station = 'KABC'; QUERY PLAN - Hash Join (cost=69.89..19157.06 rows=2322 width=28) (actual time=500.905..1473.748 rows=242 loops=1) Hash Cond: ((music.fn_mri_id_no_program("outer".mri_id_no))::text = ("inner".program_id)::text) - Seq Scan on program p (cost=0.00..16888.98 rows=173998 width=40) (actual time=98.371..532.184 rows=173998 loops=1) - Hash (cost=69.84..69.84 rows=17 width=9) (actual time=65.817..65.817 rows=0 loops=1) - Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..69.84 rows=17 width=9) (actual time=24.499..65.730 rows=32 loops=1) Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2001-01-30 00:00:00'::timestamp without time zone)) Total runtime: 1474.067 ms (7 rows) ppl=# set enable_mergejoin = false; SET ppl=# explain analyse select title from music.program p ppl-# join music.logfile l on ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no) ppl-# where l.air_date = '01/30/2001' ppl-# and l.station = 'KABC'; QUERY PLAN --- Hash Join (cost=69.89..19157.06 rows=2322 width=28) (actual time=444.834..1428.815 rows=242 loops=1) Hash Cond: ((music.fn_mri_id_no_program("outer".mri_id_no))::text = ("inner".program_id)::text) - Seq Scan on program p (cost=0.00..16888.98 rows=173998 width=40) (actual time=105.977..542.870 rows=173998 loops=1) - Hash (cost=69.84..69.84 rows=17 width=9) (actual time=1.197..1.197 rows=0 loops=1) - Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..69.84 rows=17 width=9) (actual time=0.574..1.151 rows=32 loops=1) Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2001-01-30 00:00:00'::timestamp without time zone)) Total runtime: 1429.111 ms (7 rows) ppl=# set enable_hashjoin = false; SET ppl=# explain analyse select title from music.program p ppl-# join music.logfile l on ppl-# l.program_id = music.fn_mri_id_no_program(p.mri_id_no) ppl-# where l.air_date = '01/30/2001' ppl-# and l.station = 'KABC'; QUERY PLAN -- Nested Loop (cost=0.00..58104.34 rows=2322 width=28) (actual time=0.480..5.357 rows=242 loops=1) - Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..69.84 rows=17 width=9) (actual time=0.176..0.754 rows=32 loops=1) Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2001-01-30 00:00:00'::timestamp without time zone)) - Index Scan using idx_program_mri_id_no_program on program p (cost=0.00..3400.74 rows=870 width=40) (actual time=0.041..0.127 rows=8 loops=32) Index Cond: (("outer".program_id)::text = (music.fn_mri_id_no_program(p.mri_id_no))::text) Total runtime: 5.637 ms (6 rows) Tom Lane wrote: Roger Ging [EMAIL PROTECTED] writes: Ran vacuum analyse on both program and logfile tables. Estimates are more in line with reality now, And they are what now? You really can't expect to get useful help here when you're being so miserly with the details ... FWIW, I suspect you could force 7.4 to generate 7.3's plan by setting enable_mergejoin to off (might have to also set enable_hashjoin to off, if it then tries for a hash join). 7.3 could not even consider those join types in this example, while 7.4 can. The interesting question from my perspective is why the planner is guessing wrong about the relative costs of the plans. EXPLAIN ANALYZE results with each type of join forced would be useful to look at. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] expression (functional) index use in joins
I just installed v7.4 and restored a database from v7.3.4. I have an index based on a function that the planner is using on the old version, but doing seq scans on left joins in the new version. I have run analyze on the table post restore. the query returns in less than 1 second on version 7.3.4 and takes over 10 seconds on version 7.4. Any help will be appreciated. Roger Ging Query: SELECT L.row_id FROM music.logfile L LEFT JOIN music.program P ON music.fn_mri_id_no_program(P.mri_id_no) = L.program_id WHERE L.station = UPPER('kabc')::VARCHAR AND L.air_date = '04/12/2002'::TIMESTAMP AND P.cutoff_date IS NULL ORDER BY L.chron_start,L.chron_end; planner results on 7.4: Sort (cost=17595.99..17608.23 rows=4894 width=12) Sort Key: l.chron_start, l.chron_end - Merge Left Join (cost=17135.92..17296.07 rows=4894 width=12) Merge Cond: (outer.?column5? = inner.?column3?) Filter: (inner.cutoff_date IS NULL) - Sort (cost=1681.69..1682.73 rows=414 width=21) Sort Key: (l.program_id)::text - Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..1663.70 rows=414 width=21) Index Cond: (((station)::text = 'KABC'::text) AND (air_date = '2002-04-12 00:00:00'::timestamp without time zone)) - Sort (cost=15454.22..15465.06 rows=4335 width=20) Sort Key: (music.fn_mri_id_no_program(p.mri_id_no))::text - Seq Scan on program p (cost=0.00..15192.35 rows=4335 width=20) planner results on 7.3.4: Sort (cost=55765.51..55768.33 rows=1127 width=41) Sort Key: l.chron_start, l.chron_end - Nested Loop (cost=0.00..55708.36 rows=1127 width=41) Filter: (inner.cutoff_date IS NULL) - Index Scan using idx_logfile_station_air_date on logfile l (cost=0.00..71.34 rows=17 width=21) Index Cond: ((station = 'KABC'::character varying) AND (air_date = '2002-04-12 00:00:00'::timestamp without time zone)) - Index Scan using idx_program_mri_id_no_program on program p (cost=0.00..3209.16 rows=870 width=20) Index Cond: (music.fn_mri_id_no_program(p.mri_id_no) = outer.program_id) table Program details: Column |Type | Modifiers +-+--- record_id | integer | title | character varying(40) | mri_id_no | character varying(8)| ascap_cat | character varying(1)| ascap_mult | numeric(5,3)| ascap_prod | character varying(10) | npa_ind| character varying(3)| non_inc_in | character varying(1)| as_pr_su | character varying(1)| as_1st_run | character varying(1)| as_cue_st | character varying(1)| bmi_cat| character varying(2)| bmi_mult | numeric(6,2)| bmi_prod | character varying(7)| year | integer | prog_type | character varying(1)| total_ep | integer | last_epis | character varying(3)| syndicator | character varying(6)| station| character varying(4)| syn_loc| character varying(1)| spdb_ver | character varying(4)| as_filed | character varying(4)| bmidb_ver | character varying(4)| cutoff_date| timestamp without time zone | effective_date | timestamp without time zone | program_id | character varying(5)| Indexes: idx_program_mri_id_no btree (mri_id_no) idx_program_mri_id_no_program btree (music.fn_mri_id_no_program(mri_id_no)) idx_program_program_id btree (program_id) program_mri_id_no btree (mri_id_no) program_oid btree (oid) ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match