Re: [PERFORM] Optimization inner join

Phillip Couto Thu, 19 Jan 2017 06:16:19 -0800

The picture is becoming clearer now. So to recap the issue is in the plan 
selection not utilizing the null_frac statistic properly to skip what seems to 
be in your case 99% of the rows which are NULL for the field the join is 
happening on and would be discarded anyways.


For completeness do you mind posting what versions of postgres you have tested 
this on?

-----------------
Phillip Couto



> On Jan 19, 2017, at 08:23, Gustavo Rezende Montesino 
> <[email protected]> wrote:
> 
> Hello,
> 
> Em 19/01/2017 11:04, Clailson escreveu:
>> Hi Phillip.
>> 
>> 
>>> 
>>> Not sure if it is all that common. Curious what if you put b.b IS NOT NULL 
>>> in the WHERE statement?
>> 
>> It's the question. In the company I work with, one of my clients asked me: 
>> "Why PostgreSQL does not remove rows with null in column b (table b), before 
>> joining, since these rows have no corresponding in table a?" I gave the 
>> suggestion to put the IS NOT NULL in the WHERE statement, but HE can't 
>> modify the query in the application. 
>> 
>> I did the tests with Oracle and it uses a predicate in the query plan, 
>> removing the lines where b.b is null. In Oracle, it´s the same plan, with 
>> and without IS NOT NULL in the WHERE statement.
> 
> Being the client in question, I would like to make a little remark: What we 
> thought could be optimized here at first is on the row estimate of the index 
> scan; which could take null_frac into account. To put things into 
> perspective, our similar case in production has a table with 6 million lines 
> where only 9.5k aren´t null for the join field, an the over-estimation is 
> throwing away good plans (like ~150ms execution time) in favor of pretty bad 
> ones (~80s execution time).
> 
> We´ve asked application people to put the where not null workaround, which 
> works great, and are waiting on an answer, but I believe getting better 
> estimates without that would be great if possible.
> 
>> 
>> On 19/01/2017 09:34, Phillip Couto wrote:
>>> NULL is still a value that may be paired with a NULL in a.a
> 
> Is that so? I would believe you would never get a match, as NULL <> NULL
> 
>>>> On Jan 19, 2017, at 05:08, Clailson < 
>>>> <mailto:[email protected]>[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Is there something in the roadmap to optimize the inner join?
>>>> 
>>>> I've this situation above. Table b has 400 rows with null in the column b.
>>>> 
>>>> explain analyze select * from a inner join b on (b.b = a.a);
>>>> "Merge Join  (cost=0.55..65.30 rows=599 width=16) (actual 
>>>> time=0.030..1.173 rows=599 loops=1)" 
>>>> "  Merge Cond: (a.a = b.b)" 
>>>> "  ->  Index Scan using a_pkey on a  (cost=0.28..35.27 rows=1000 width=8) 
>>>> (actual time=0.014..0.364 rows=1000 loops=1)" 
>>>> "  ->  Index Scan using in01 on b  (cost=0.28..33.27 rows=1000 width=8) 
>>>> (actual time=0.012..0.249 rows=600 loops=1)" 
>>>> "Total runtime: 1.248 ms" 
>>>> 
>>>> My question is: Why the planner isn't removing the null rows during the 
>>>> scan of table b?
>>>> -- 
>>>> Clailson Soares Dinízio de Almeida
>>> 
>> 
> 
> 
> Regards,
> 
> Gustavo R. Montesino
> Tribunal Regional do Trabalho da 2a Região
> Secretaria de Tecnologia da Informação e Comunicação
> Coordenadoria de Infraestrutura de TIC
> Seção de Administração de Banco de Dados
> Av. Marquês de São Vicente, 121 - Bl. A - Sala 404
> Telefone: (11) 3150-2082
> 
>

Re: [PERFORM] Optimization inner join

Reply via email to