Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-21 Thread Kevin Grittner
 Tom Lane [EMAIL PROTECTED] wrote: 
 
 Introduce JOIN_SEMI and JOIN_ANTI join types, the former replacing
 JOIN_IN.  Unify the InClauseInfo and OuterJoinInfo infrastructure
into
 SpecialJoinInfo.  Convert IN, EXISTS, and NOT EXISTS clauses at
top
 level of WHERE into semi and anti joins respectively.
 
It just struck me that this may cause additional joins to count
against the join_collapse_limit.  If so, that could cause some
borderline queries to optimize poorly if the limit isn't raised.  Is
that a reasonable concern?  Possibly something to document in the
release notes?
 
-Kevin
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-21 Thread Tom Lane
Kevin Grittner [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote: 
 Introduce JOIN_SEMI and JOIN_ANTI join types, the former replacing
 JOIN_IN.
 
 It just struck me that this may cause additional joins to count
 against the join_collapse_limit.  If so, that could cause some
 borderline queries to optimize poorly if the limit isn't raised.  Is
 that a reasonable concern?  Possibly something to document in the
 release notes?

I don't think we should change anything there.  The collapse limit
settings are mainly driven by not wanting to get into an exponential
growth of planning time, and the way in which join situations are
created doesn't really affect the appropriate value one way or the
other.

In a case where this did happen, you'd just have exchanged one
suboptimal planning situation for another, so I'm unconvinced that it'd
make matters any worse compared to prior releases.  (That does seem like
a point worth testing, though, if you want to.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-14 Thread Simon Riggs

On Wed, 2008-08-13 at 23:12 -0400, Tom Lane wrote:

 We're just trying to provide better performance for certain common SQL
 idioms.

Sounds good, but can you explain how this will help? Not questioning it,
just after more information about it.

I'm half way through join removal patch, so this work might extend the
join elimination to semi/anti joins also (hopefully), or it might
(hopefully not) prevent the join elimination altogether. I'll let you
know how I get on.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-14 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Wed, 2008-08-13 at 23:12 -0400, Tom Lane wrote:
 We're just trying to provide better performance for certain common SQL
 idioms.

 Sounds good, but can you explain how this will help?

1. Allowing optimization of EXISTS/NOT EXISTS as general-purpose joins.
Up to now, the best plan you could hope for was the equivalent of a
nestloop with inner indexscan, with the EXISTS subquery on the inside.
While that's not necessarily bad, it could be pretty bad for a large
outer table.  Now we'll have the option to consider merge and hash
joins too.

2. Allowing the planner to get better estimates of the result size of
these special join types.  In the past we've had to kluge that, and
the results weren't always good.  Part of what I'm doing (the unfinished
part ;-)) is to make more information about join context available to
selectivity estimation functions, which is something we've known we
needed for awhile.  I can't yet *prove* that I can get better estimates
with the added info, but if not, that just means I need to rethink what
to pass down exactly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-14 Thread Simon Riggs

On Thu, 2008-08-14 at 10:04 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Wed, 2008-08-13 at 23:12 -0400, Tom Lane wrote:
  We're just trying to provide better performance for certain common SQL
  idioms.
 
  Sounds good, but can you explain how this will help?
 
 1. Allowing optimization of EXISTS/NOT EXISTS as general-purpose joins.
 Up to now, the best plan you could hope for was the equivalent of a
 nestloop with inner indexscan, with the EXISTS subquery on the inside.
 While that's not necessarily bad, it could be pretty bad for a large
 outer table.  Now we'll have the option to consider merge and hash
 joins too.
 
 2. Allowing the planner to get better estimates of the result size of
 these special join types.  In the past we've had to kluge that, and
 the results weren't always good.  Part of what I'm doing (the unfinished
 part ;-)) is to make more information about join context available to
 selectivity estimation functions, which is something we've known we
 needed for awhile.  I can't yet *prove* that I can get better estimates
 with the added info, but if not, that just means I need to rethink what
 to pass down exactly.

OK, that sounds good. Are you also working on transforming NOT IN into
different form? Or is that the same thing as (1)?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-14 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 OK, that sounds good. Are you also working on transforming NOT IN into
 different form? Or is that the same thing as (1)?

I'm not currently thinking about NOT IN.  It could be transformed to
an antijoin if we could prove that no nulls are involved, but that
seems less than trivial as I noted earlier.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-14 Thread Kevin Grittner
 Tom Lane [EMAIL PROTECTED] wrote: 
 I can't yet *prove* that I can get better estimates
 with the added info, but if not, that just means I need to rethink
what
 to pass down exactly.
 
I'll see if I can do some testing here to confirm plan improvements
and check estimate accuracy.  This is only on the trunk, not any
stable branches?  I assume that effort should wait until you have a
candidate for the estimate improvements?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-14 Thread Tom Lane
Kevin Grittner [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] wrote: 
 I can't yet *prove* that I can get better estimates
 with the added info, but if not, that just means I need to rethink what
 to pass down exactly.
 
 I'll see if I can do some testing here to confirm plan improvements
 and check estimate accuracy.  This is only on the trunk, not any
 stable branches?  I assume that effort should wait until you have a
 candidate for the estimate improvements?

Yeah.  If you feel like it you can test what I just committed, but it's
definitely not where I expect to end up in another few days.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-13 Thread David E. Wheeler

On Aug 13, 2008, at 17:31, Tom Lane wrote:


What's done:

Introduce JOIN_SEMI and JOIN_ANTI join types, the former replacing
JOIN_IN.  Unify the InClauseInfo and OuterJoinInfo infrastructure into
SpecialJoinInfo.  Convert IN, EXISTS, and NOT EXISTS clauses at top
level of WHERE into semi and anti joins respectively.  Recognize
LEFT JOIN with a suitable IS NULL filter condition as an anti join.
This all compiles and passes the regression tests.


Wow. That sound awesome, Tom. Stupid question: Do these join types  
have some sort of correspondence to the SQL standard? Or would they be  
specific to PostgreSQL? Or is this just something that's under the  
hood an not actually a change to the syntax of SQL joins?



What's not done:

nodeMergejoin.c doesn't yet handle JOIN_ANTI.  (This is just a SMOP,
but it's a lot more complicated than the nestloop or hash logic, and
I figured nestloop and hash were enough for testing the planner.)


I guess that means you plan to do it once there has been significant  
testing with nestloop and hash and when the selectivity stuff is done?


Best,

David
(Who is in over his head, but decides to stick his toe in the water  
anyway.)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-13 Thread Tom Lane
David E. Wheeler [EMAIL PROTECTED] writes:
 On Aug 13, 2008, at 17:31, Tom Lane wrote:
 Introduce JOIN_SEMI and JOIN_ANTI join types,

 Wow. That sound awesome, Tom. Stupid question: Do these join types  
 have some sort of correspondence to the SQL standard?

Semi and anti joins are pretty standard concepts in relational theory,
but they have no direct mapping in the SQL join syntax.  You can write
them with certain well-known locutions, though:
IN and EXISTS, with certain restrictions, represent semi join
NOT EXISTS, with certain restrictions, represents anti join
LEFT JOIN with an incompatible higher IS NULL test represents anti 
join

Basically what this patch is about is teaching the planner that these
constructs are best understood via the relational-theory concepts.
We'd been doing it in a pretty ad-hoc way before, and run into a lot
of problems that we've had to kluge around.  I think that this approach
provides a structure that will actually work well.

 Or is this just something that's under the  
 hood an not actually a change to the syntax of SQL joins?

Right, there's no user visible feature or syntax change here.  We're
just trying to provide better performance for certain common SQL idioms.

 What's not done:
 
 nodeMergejoin.c doesn't yet handle JOIN_ANTI.  (This is just a SMOP,

 I guess that means you plan to do it once there has been significant  
 testing with nestloop and hash and when the selectivity stuff is done?

Actually, I got it done an hour or so ago --- it turned out to be easier
than I thought.  It just didn't seem like part of the critical path for
the patch, so I'd been willing to let it go till later.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-13 Thread David E. Wheeler

On Aug 13, 2008, at 20:12, Tom Lane wrote:


Wow. That sound awesome, Tom. Stupid question: Do these join types
have some sort of correspondence to the SQL standard?


Semi and anti joins are pretty standard concepts in relational theory,
but they have no direct mapping in the SQL join syntax.  You can write
them with certain well-known locutions, though:
IN and EXISTS, with certain restrictions, represent semi join
NOT EXISTS, with certain restrictions, represents anti join
	LEFT JOIN with an incompatible higher IS NULL test represents  
anti join


Basically what this patch is about is teaching the planner that these
constructs are best understood via the relational-theory concepts.
We'd been doing it in a pretty ad-hoc way before, and run into a lot
of problems that we've had to kluge around.  I think that this  
approach

provides a structure that will actually work well.


Great. Thanks for the explanation, Tom, as always.


Or is this just something that's under the
hood an not actually a change to the syntax of SQL joins?


Right, there's no user visible feature or syntax change here.  We're
just trying to provide better performance for certain common SQL  
idioms.


Good, it makes a lot of sense.




What's not done:

nodeMergejoin.c doesn't yet handle JOIN_ANTI.  (This is just a SMOP,



I guess that means you plan to do it once there has been significant
testing with nestloop and hash and when the selectivity stuff is  
done?


Actually, I got it done an hour or so ago --- it turned out to be  
easier
than I thought.  It just didn't seem like part of the critical path  
for

the patch, so I'd been willing to let it go till later.


I love it when things work that way. :-)

Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers