Re: [PERFORM] Propagating outer join conditions

2006-12-03 Thread Jonathan Blitz
How about trying:

Select *
From
(Select * from t28 where t28.0='spec')  t28a
Left out join (t1 JOIN t11 ON
 (t11.o = 'http://example.org' AND t11.s = t1.o)) ON t28a.s = t1.s

In this way, I think, the where clause on t28 would be performed before the
join rather than after.

Jonathan Blitz


 -Original Message-
 From: Aaron Birkland [mailto:[EMAIL PROTECTED]
 Sent: Sunday, December 03, 2006 5:12 PM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Propagating outer join conditions
 
 The following left outer join plan puzzles me:
 
 EXPLAIN ANALYZE SELECT * from t28 LEFT OUTER JOIN (t1 JOIN t11 ON
 (t11.o = 'http://example.org' AND t11.s = t1.o)) ON t28.s = t1.s
 WHERE t28.o = 'spec';
 
 t28, t1, and t11 all have indexed columns named 's' and 'o' that contain
'text';
 
  Nested Loop Left Join  (cost=794249.26..3289704.61 rows=1 width=301)
 (actual time=581293.390..581293.492 rows=1 loops=1)
Join Filter: (t28.s = t1.s)
-  Index Scan using t28_o on t28  (cost=0.00..9.22 rows=1
 width=89) (actual time=0.073..0.077 rows=1 loops=1)
  Index Cond: (o = 'spec'::text)
-  Merge Join  (cost=794249.26..3267020.66 rows=1813979 width=212)
 (actual time=230365.522..577078.266 rows=1894969 loops=1)
  Merge Cond: (t1.o = t11.s)
  -  Index Scan using t1_o on t1  (cost=0.00..2390242.10
 rows=5696 width=109) (actual time=0.209..162586.801 rows=3925
 loops=1)
  -  Sort  (cost=794249.26..798784.21 rows=1813979 width=103)
 (actual time=230365.175..237409.474 rows=1894969 loops=1)
Sort Key: t11.s
-  Bitmap Heap Scan on t11  (cost=78450.82..605679.55
 rows=1813979 width=103) (actual time=3252.103..22782.271 rows=1894969
 loops=1)
  Recheck Cond: (o = 'http://example.org'::text)
  -  Bitmap Index Scan on t11_o
 (cost=0.00..78450.82 rows=1813979 width=0) (actual
 time=2445.422..2445.422 rows=1894969 loops=1)
Index Cond: (o = 'http://example.org'::text)
 
 
 It seems to me that this plan is not very desirable, since the outer
 part of the nested loop left join (the merge join node) is very
 expensive. Is is possible to generate a plan that looks like this:
 
  Nested Loop Left Join  (cost=???)
-  Index Scan using t28_o on t28  (cost=0.00..9.11 rows=1 width=89)
  Index Cond: (o = 'spec'::text)
-  Nested Loop  (cost=???)
  -  Index Scan using t1_s on t1  (cost=???)
Index Cond: (s = t28.s)
  -  Bitmap Heap Scan on t11  (cost=???)
Recheck Cond: (t11.s = t1.o)
Filter: (o = 'http://example.org'::text)
-  Bitmap Index Scan on t11_s  (cost=??? )
  Index Cond: (t11.s = t1.o)
 
 I *think* this plan is equivalent to the above if I'm assuming the
 behaviour of the 'nested loop left join' node correctly.  So far, I
 have been tweaking the statistics, cost estimates, and
 enabling.disabling certain plans to see if I can get it to propagate
 the join condition t1.s = t28.s to the outer node of the left join..
 but so far, I cannot.  So, my questions are:
 
 1) Is my 'desired' query plan logic correct
 2) Can the executor execute a plan such as my 'desired' plan
 3) If (1) and (2) are 'yes', then how may I get the planner to
 generate such a plan, or do I just need to look harder into tweaking
 the statistics and cost estimates
 
   -Aaron
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 
 
 --
 No virus found in this incoming message.
 Checked by AVG Free Edition.
 Version: 7.1.409 / Virus Database: 268.15.6/565 - Release Date: 12/2/2006
 
 
 --
 No virus found in this incoming message.
 Checked by AVG Free Edition.
 Version: 7.1.409 / Virus Database: 268.15.6/565 - Release Date: 12/02/2006
 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.15.6/565 - Release Date: 12/02/2006
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Propagating outer join conditions

2006-12-03 Thread Aaron Birkland

First, I forgot to mention - this is 8.2 RC1 I was trying on

The suggested change produces an identical 'bad' query plan.  The main
issue (I think) is that the query node that processes t1 JOIN t11 ON
..' is not aware of the join condition 't28.s = t1.s'.. even though
the value of t28.s (as determined by the inner index scan where t28.o
= 'spec') could(should?) theoretically be known to it. If it did, then
I imagine it would realize that a nested loop join starting with t1.s
= t28.s (which is very selective) would be much cheaper than doing the
big merge join.

 -Aaron

On 12/3/06, Jonathan Blitz [EMAIL PROTECTED] wrote:

How about trying:

Select *
From
(Select * from t28 where t28.0='spec')  t28a
Left out join (t1 JOIN t11 ON
 (t11.o = 'http://example.org' AND t11.s = t1.o)) ON t28a.s = t1.s

In this way, I think, the where clause on t28 would be performed before the
join rather than after.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Propagating outer join conditions

2006-12-03 Thread Tom Lane
Aaron Birkland [EMAIL PROTECTED] writes:
 ... Is is possible to generate a plan that looks like this:

  Nested Loop Left Join  (cost=???)
-  Index Scan using t28_o on t28  (cost=0.00..9.11 rows=1 width=89)
  Index Cond: (o = 'spec'::text)
-  Nested Loop  (cost=???)
  -  Index Scan using t1_s on t1  (cost=???)
Index Cond: (s = t28.s)
  -  Bitmap Heap Scan on t11  (cost=???)
Recheck Cond: (t11.s = t1.o)
Filter: (o = 'http://example.org'::text)
-  Bitmap Index Scan on t11_s  (cost=??? )
  Index Cond: (t11.s = t1.o)

No.  It'd take massive work in both the planner and executor to support
something like that --- they are designed around the assumption that
nestloop with inner index scan is exactly that, just one indexscan on
the inside of the loop.  (As of 8.2 there's some klugery to handle an
Append of indexscans too, but it won't generalize readily.)

It might be something to work on for the future, but I can see a couple
of risks to trying to support this type of plan:
* exponential growth in the number of plans the planner has to consider;
* greatly increased runtime penalty if the planner has underestimated
  the number of rows produced by the outer side of the nestloop.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings