[PERFORM] Analyze makes queries slow...

2003-08-14 Thread Stef
Hi all,

I posted this problem on the sql list, and was referred to this list in stead.
I have attached an sql statement that normally runs under 3 minutes.
That is, until I vacuum analyze the database (or just the tables in the query),
then the same query runs longer than 12 hours, and I have to kill it.

However 90% of queries are faster after analyzing on this database,
there are two or three, including this one that takes for ever.

I have tried to reverse engineer the explain plan from before analyzing,
to come up with an sql statement, using proper joins, to force the planner
to do the original join, but although I came close, I never got the same 
result as the original query.

I suspect that this might be caused by some of the crazy indexes that 
were built on some of these tables, but I can't really do much about that,
unless I can come up with a very good reason to nuke them.

I also attached the create table statements for all the tables, as well
as a row count of each.

Can somebody help me with guidelines or something similar, 
to understand exactly what is happening in the explain plan.

TIA
Stefan

 Aggregate  (cost=52.00..61.64 rows=32 width=241)
   -  Group  (cost=52.00..57.62 rows=321 width=241)
 -  Sort  (cost=52.00..52.80 rows=321 width=241)
   Sort Key: m.group_code, m.sku, m.brn_code, m.stktype_code, 
ss.supplier_price_curr_cost, sk.sku_price_curr_cost
   -  Merge Join  (cost=36.38..38.62 rows=321 width=241)
 Merge Cond: ((outer.group_cde = inner.group_cde) AND 
(outer.brn_code = inner.brn_code))
 Join Filter: ((outer.price_tmpl_hdr_cde = 
inner.price_tmpl_hdr_cde) AND (outer.price_tmpl_hdr_reg = 
inner.price_tmpl_hdr_reg))
 -  Nested Loop  (cost=0.00..1407212.08 rows=63 width=179)
   Join Filter: ((inner.sku_mst_cde = outer.sku) AND 
(inner.group_cde = outer.group_code))
   -  Merge Join  (cost=0.00..1405644.89 rows=315 width=135)
 Merge Cond: (outer.group_code = inner.group_code)
 Join Filter: (outer.sku = inner.sku)
 -  Nested Loop  (cost=0.00..4826563.70 rows=8694 
width=108)
   -  Index Scan using master_fpp_values_idx2 on 
master_fpp_values m  (cost=0.00..3766902.34 rows=215650 width=54)
 Filter: (fpp_code = '200307'::text)
   -  Index Scan using pk_supplier_price on 
supplier_price ss  (cost=0.00..4.90 rows=1 width=54)
 Index Cond: ((ss.group_cde = 
outer.group_code) AND (ss.sku_mst_cde = outer.sku) AND (ss.supplier_cde = 
outer.supplier_code))
 -  Index Scan using master_sku_descr_idx3 on 
master_sku_descr s  (cost=0.00..2535.04 rows=10758 width=27)
   Filter: (control_code = '0'::text)
   -  Index Scan using idx_sku_price on sku_price sk  
(cost=0.00..4.96 rows=1 width=44)
 Index Cond: ((sk.group_cde = outer.group_cde) AND 
(sk.sku_mst_cde = outer.sku_mst_cde) AND (sk.price_tmpl_hdr_cde = 
outer.price_tmpl_hdr_cde) AND (sk.price_tmpl_hdr_reg = outer.price_tmpl_hdr_reg))
 -  Sort  (cost=36.38..36.87 rows=198 width=62)
   Sort Key: p.group_cde, p.branch_cde
   -  Hash Join  (cost=18.46..28.82 rows=198 width=62)
 Hash Cond: (outer.brn_code = inner.branch_cde)
 -  Merge Join  (cost=13.94..20.34 rows=198 width=33)
   Merge Cond: (outer.country_code = 
inner.from_ctry)
   -  Index Scan using master_branch_descr_idx4 
on master_branch_descr b  (cost=0.00..33.12 rows=198 width=15)
   -  Sort  (cost=13.94..13.95 rows=4 width=18)
 Sort Key: f.from_ctry
 -  Index Scan using forex_idx1 on forex 
f  (cost=0.00..13.90 rows=4 width=18)
   Index Cond: ((to_ctry = 
'ZAF'::text) AND (fpp_code = '200307'::text))
 -  Hash  (cost=4.02..4.02 rows=202 width=29)
   -  Seq Scan on price_tmpl_det p  
(cost=0.00..4.02 rows=202 width=29)
(34 rows)

 Aggregate  (cost=163.58..163.61 rows=1 width=699)
   -  Group  (cost=163.58..163.60 rows=1 width=699)
 -  Sort  (cost=163.58..163.58 rows=1 width=699)
   Sort Key: m.group_code, m.sku, m.brn_code, m.stktype_code, 
ss.supplier_price_curr_cost, sk.sku_price_curr_cost
   -  Nested Loop  (cost=115.56..163.57 rows=1 width=699)
 Join Filter: ((outer.sku = inner.sku) 

Re: [PERFORM] Analyze makes queries slow...

2003-08-14 Thread Jacek Rembisz
On Mon, Aug 11, 2003 at 03:58:41PM +0200, Stef wrote:

 I have attached an sql statement that normally runs under 3 minutes.
 That is, until I vacuum analyze the database (or just the tables in the query),
 then the same query runs longer than 12 hours, and I have to kill it.

Hmm, I have noticed similar problem with a query with order by ... limit 
clause.Although it runs only 10 times slower after analyze :)

The query joins one big table (20 000 rows) with several small tables
(200-4000 rows) than order by primary key of big table limit 20

Without this order by ... limit clause the query is 5 times faster after
analyze.

Looking into explain analyze outputs:
1. Before vacuum analyze  a planer chooses nested loop, the  deepest is:
  -  Nested Loop  (cost=0.00..116866.54 rows=19286 width=96) (actual time=0.14..1.39 
rows=21 loops=1)
  -  Index Scan Backward using big_table_pkey on big_table k  (cost=0.00..1461.15 
rows=19286 width=52) (actual time=0.07..0.47 rows=21 loops=1)
  -  Index Scan using 4000rows_table_pkey on 4000rows_table zs  (cost=0.00..5.97 
rows=1 width=44) (actual time=0.02..0.02 rows=0 loops=21)

2. After analyze uses hashjoins

When I remove this order by limit clause the query after analyze takes 
the same time and the query before analyze is much more slower.

I won't blame the planer. How he could learn that he should first 
take those 20 rows and than perform joins? There is a where clause
with complex exists(subquery) condition regarding one of big_table fields,
but removing this condition does not change the query plan.

Pure joining without any additional conditions and only primary key of big 
table in select clause runs 4 times slower then whole query before 
vacuuum analyze :)

Does in all the planer take in the consideration the limit clause?

Probably I'm missing something. I don't know much about the planer.

Finaly I have redesigned the query.

Regards,
Jacek


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Analyze makes queries slow...

2003-08-14 Thread Tom Lane
Stef [EMAIL PROTECTED] writes:
 = Could we see the results of EXPLAIN ANALYZE, rather than just EXPLAIN,
 = for the un-analyzed case? 

 Attached the output of this.

Hmm... not immediately obvious where it's going wrong.  Could you try
this (after ANALYZE):

set enable_mergejoin to off;
explain analyze   ... query ...

If it finishes in a reasonable amount of time, send the explain output.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly