On Sun, Jun 8, 2014 at 5:36 AM, David Rowley <dgrowle...@gmail.com> wrote:
> Currently pull_up_sublinks_qual_recurse only changes the plan for NOT > EXISTS queries and leaves NOT IN alone. The reason for this is because the > values returned by a subquery in the IN clause could have NULLs. > > A simple example of this (without a subquery) is: > > select 1 where 3 not in (1, 2, null); returns 0 rows because 3 <> NULL is > unknown. > > The attached patch allows an ANTI-join plan to be generated in cases like: > > CREATE TABLE a (id INT PRIMARY KEY, b_id INT NOT NULL); > CREATE TABLE b (id INT NOT NULL); > > SELECT * FROM a WHERE b_id NOT IN(SELECT id FROM b); > > To generate a plan like: > QUERY PLAN > ----------------------------------------------------------------- > Hash Anti Join (cost=64.00..137.13 rows=1070 width=8) > Hash Cond: (a.b_id = b.id) > -> Seq Scan on a (cost=0.00..31.40 rows=2140 width=8) > -> Hash (cost=34.00..34.00 rows=2400 width=4) > -> Seq Scan on b (cost=0.00..34.00 rows=2400 width=4) > I think this will be great, I've run into this problem often from applications I have no control over. I thought a more complete, but probably much harder, solution would be to add some metadata to the hash anti-join infrastructure that tells it "If you find any nulls in the outer scan, stop running without returning any rows". I think that should work because the outer rel already has to run completely before any rows can be returned. But what I can't figure out is, would that change obviate the need for your change? Once we can correctly deal with nulls in a NOT IN list through a hash anti join, is there a cost estimation advantage to being able to prove that the that null can't occur? (And of course if you have code that works, while I have vague notions of what might be, then my notion probably does not block your code.) Cheers, Jeff