On Sun, Jun 8, 2014 at 5:36 AM, David Rowley <dgrowle...@gmail.com> wrote:

> Currently pull_up_sublinks_qual_recurse only changes the plan for NOT
> EXISTS queries and leaves NOT IN alone. The reason for this is because the
> values returned by a subquery in the IN clause could have NULLs.
>
> A simple example of this (without a subquery) is:
>
> select 1 where 3 not in (1, 2, null); returns 0 rows because 3 <> NULL is
> unknown.
>
> The attached patch allows an ANTI-join plan to be generated in cases like:
>
> CREATE TABLE a (id INT PRIMARY KEY, b_id INT NOT NULL);
> CREATE TABLE b (id INT NOT NULL);
>
> SELECT * FROM a WHERE b_id NOT IN(SELECT id FROM b);
>
> To generate a plan like:
>                            QUERY PLAN
> -----------------------------------------------------------------
>  Hash Anti Join  (cost=64.00..137.13 rows=1070 width=8)
>    Hash Cond: (a.b_id = b.id)
>    ->  Seq Scan on a  (cost=0.00..31.40 rows=2140 width=8)
>    ->  Hash  (cost=34.00..34.00 rows=2400 width=4)
>          ->  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)
>


I think this will be great, I've run into this problem often from
applications I have no control over.  I thought a more complete, but
probably much harder, solution would be to add some metadata to the hash
anti-join infrastructure that tells it "If you find any nulls in the outer
scan, stop running without returning any rows".  I think that should work
because the outer rel already has to run completely before any rows can be
returned.

But what I can't figure out is, would that change obviate the need for your
change?  Once we can correctly deal with nulls in a NOT IN list through a
hash anti join, is there a cost estimation advantage to being able to prove
that the that null can't occur?  (And of course if you have code that
works, while I have vague notions of what might be, then my notion probably
does not block your code.)

Cheers,

Jeff

Reply via email to