The current transformation would not add "or s.a is NULL" in the example 
provided since it is non-nullable. You will be comparing these two cases in 
terms of the transformation:

select count(*) from big b where not exists(select 1 from small s
where s.a = b.a);
Time: 51.416 ms

select count(*) from big b where a not in (select a from s);
Time: 890.088 ms

 But if s.a is nullable, yes, you have proved my previous statement is false... 
I should have used almost always.
However, if s.a is nullable, we would do this transformation:
    select count(*) from big b where not exists(select 1 from small s
    where s.a = b.a or s.a is null);

It's possible to stop the nested loop join early during execution once we find 
an inner Null entry because every outer tuple is
going to evaluate to true on the join condition.

Zheng

On 3/1/19, 6:17 PM, "David Rowley" <david.row...@2ndquadrant.com> wrote:

    On Sat, 2 Mar 2019 at 12:13, Tom Lane <t...@sss.pgh.pa.us> wrote:
    >
    > "Li, Zheng" <zhe...@amazon.com> writes:
    > > Although adding "or var is NULL" to the anti join condition forces the 
planner to choose nested loop anti join, it is always faster compared to the 
original plan.
    >
    > TBH, I am *really* skeptical of sweeping claims like that.  The existing
    > code will typically produce a hashed-subplan plan, which ought not be
    > that awful as long as the subquery result doesn't blow out memory.
    > It certainly is going to beat a naive nested loop.
    
    It's pretty easy to show the claim is false using master and NOT EXISTS.
    
    create table small(a int not null);
    create table big (a int not null);
    insert into small select generate_Series(1,1000);
    insert into big select x%1000+1 from generate_Series(1,1000000) x;
    
    select count(*) from big b where not exists(select 1 from small s
    where s.a = b.a);
    Time: 178.575 ms
    
    select count(*) from big b where not exists(select 1 from small s
    where s.a = b.a or s.a is null);
    Time: 38049.969 ms (00:38.050)
    
    
    -- 
     David Rowley                   http://www.2ndQuadrant.com/
     PostgreSQL Development, 24x7 Support, Training & Services
    

Reply via email to