Re: [PERFORM] mis-estimation on data-warehouse aggregate creation

2004-11-17 Thread Kris Jurka


On Tue, 16 Nov 2004, Simon Riggs wrote:

 The join condition has so many ANDed predicates that we assume that this
 will reduce the selectivity considerably. It does not, and so you pay
 the cost dearly later on.
 

Yes, that makes a lot of sense.  Without some incredibly good cross-column
statistics there is no way it could expect all of the rows to match.  
Thanks for the analysis.

Kris Jurka

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] mis-estimation on data-warehouse aggregate creation

2004-11-16 Thread Kris Jurka


On Tue, 16 Nov 2004, F. Senault wrote:

 Let me guess...  You've never run analyze on your tables ?
 

No, I have.  I mentioned that I did in my email, but you can also tell by
the exactly correct guesses for some other plan steps:

-  Seq Scan on period  (cost=0.00..90.88 rows=3288 width=54) (actual 
time=0.118..12.126 rows=3288 loops=1)

Kris Jurka

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] mis-estimation on data-warehouse aggregate creation

2004-11-16 Thread Simon Riggs
On Tue, 2004-11-16 at 09:10, Kris Jurka wrote:
  By rewriting the JOIN 
 conditions to LEFT JOIN we force the planner to recognize that there will 
 be a match for every row in the sales table:
 

You realise that returns a different answer (or at least it potentially
does, depending upon your data?

  -  Hash Join  (cost=4.70..194.23 rows=1 width=12) 
 (actual time=2.675..74.693 rows=3288 loops=1)
Hash Cond: ((outer.monthnumber = 
 inner.monthnumber) AND (outer.monthname = inner.monthname) AND 
 (outer.year = inner.year) AND (outer.monthyear = inner.monthyear) 
 AND (outer.quarter = inner.quarter) AND (outer.quarteryear = 
 inner.quarteryear))
-  Seq Scan on period  (cost=0.00..90.88 
 rows=3288 width=54) (actual time=0.118..12.126 rows=3288 loops=1)
-  Hash  (cost=3.08..3.08 rows=108 width=58) 
 (actual time=1.658..1.658 rows=0 loops=1)
  -  Seq Scan on shd_month  (cost=0.00..3.08 
 rows=108 width=58) (actual time=0.081..0.947 rows=108 loops=1)

ISTM your trouble starts here ^^^
estimate=1, but rows=3288 

The join condition has so many ANDed predicates that we assume that this
will reduce the selectivity considerably. It does not, and so you pay
the cost dearly later on.

In both plans, the trouble starts at this point.

If you pre-build tables that have only a single join column between the
full.oldids and shrunken.renumberedids then this will most likely work
correctly, since the planner will be able to correctly estimate the join
selectivity. i.e. put product.id onto shd_productline ahead of time, so
you can avoid the complex join.

Setting join_collapse_limit lower doesn't look like it would help, since
the plan already shows joining the sub-queries together first.

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly