Ah ok that makes sense. I am curious if there is actually a performance benefit 
to doing that. In postgresql as per the execution plan you provided the Merge 
Join joins both sets after the have been sorted. If they are sorted already 
then the NULLs will all be grouped at the beginning or end. (Can’t remember 
what the ordering is) Postgresql will just skip all the records with probably 
the same effort as removing them and then merging. The only performance 
improvement I could potentially see is if there is a lot of NULLS in one set 
then the cost to sort them may be large enough to recoup by ignoring them 
before the merge sort.

I hope someone more familiar with the internals can chime in as I would like to 
learn more if there is a real benefit here or a better idea why postgres does 
not do it.

-----------------
Phillip Couto



> On Jan 19, 2017, at 08:04, Clailson <clailson....@gmail.com> wrote:
> 
> Hi Phillip.
> 
>> The only optimization I could see is if the a.a column has NOT NULL defined 
>> while b.b does not have NOT NULL defined.
> a.a is the primary key on table a and b.b is the foreign key on table b.
> 
>  Tabela "public.a" 
> +--------+---------+---------------+
> | Coluna |  Tipo   | Modificadores |
> +--------+---------+---------------+
> | a      | integer | não nulo      |
> | b      | integer |               |
> +--------+---------+---------------+
> Índices:
>     "a_pkey" PRIMARY KEY, btree (a)
> Referenciada por:
>     TABLE "b" CONSTRAINT "b_b_fkey" FOREIGN KEY (b) REFERENCES a(a)
> 
>          Tabela "public.b" 
> +--------+---------+---------------+
> | Coluna |  Tipo   | Modificadores |
> +--------+---------+---------------+
> | a      | integer | não nulo      |
> | b      | integer |               |
> +--------+---------+---------------+
> Índices:
>     "b_pkey" PRIMARY KEY, btree (a)
> Restrições de chave estrangeira:
>     "b_b_fkey" FOREIGN KEY (b) REFERENCES a(a)
>> 
>> Not sure if it is all that common. Curious what if you put b.b IS NOT NULL 
>> in the WHERE statement?
> 
> It's the question. In the company I work with, one of my clients asked me: 
> "Why PostgreSQL does not remove rows with null in column b (table b), before 
> joining, since these rows have no corresponding in table a?" I gave the 
> suggestion to put the IS NOT NULL in the WHERE statement, but HE can't modify 
> the query in the application. 
> 
> I did the tests with Oracle and it uses a predicate in the query plan, 
> removing the lines where b.b is null. In Oracle, it´s the same plan, with and 
> without IS NOT NULL in the WHERE statement.
>  -- 
> Clailson Soares Dinízio de Almeida
> 
> On 19/01/2017 09:34, Phillip Couto wrote:
>> NULL is still a value that may be paired with a NULL in a.a
>> 
>> The only optimization I could see is if the a.a column has NOT NULL defined 
>> while b.b does not have NOT NULL defined.
>> 
>> Not sure if it is all that common. Curious what if you put b.b IS NOT NULL 
>> in the WHERE statement?
>> 
>> -----------------
>> Phillip Couto
>> 
>> 
>> 
>>> On Jan 19, 2017, at 05:08, Clailson <clailson....@gmail.com 
>>> <mailto:clailson....@gmail.com>> wrote:
>>> 
>>> Hi,
>>> 
>>> Is there something in the roadmap to optimize the inner join?
>>> 
>>> I've this situation above. Table b has 400 rows with null in the column b.
>>> 
>>> explain analyze select * from a inner join b on (b.b = a.a);
>>> "Merge Join  (cost=0.55..65.30 rows=599 width=16) (actual time=0.030..1.173 
>>> rows=599 loops=1)" 
>>> "  Merge Cond: (a.a = b.b)" 
>>> "  ->  Index Scan using a_pkey on a  (cost=0.28..35.27 rows=1000 width=8) 
>>> (actual time=0.014..0.364 rows=1000 loops=1)" 
>>> "  ->  Index Scan using in01 on b  (cost=0.28..33.27 rows=1000 width=8) 
>>> (actual time=0.012..0.249 rows=600 loops=1)" 
>>> "Total runtime: 1.248 ms" 
>>> 
>>> My question is: Why the planner isn't removing the null rows during the 
>>> scan of table b?
>>> -- 
>>> Clailson Soares Dinízio de Almeida
>> 
> 


Reply via email to