On Wed, Mar 1, 2017 at 5:24 PM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Wed, Mar 1, 2017 at 2:12 PM, David G. Johnston < > david.g.johns...@gmail.com> wrote: > >> On Wed, Mar 1, 2017 at 3:00 PM, Stefan Andreatta <s.andrea...@synedra.com >> > wrote: >> >>> plain analyze >>> select tmp_san_1.id >>> from tmp_san_1 >>> left join tmp_san_2 on tmp_san_1.text = tmp_san_2.text >>> where tmp_san_2.id is null; >>> >>> Does it help if you check for "tmp_san_2.text is null"? >> >> >> > Yes. And if you swap it so that the left join is on the integer while IS > NULL is on the text, that also gets poorly estimated. Also, if you make > both column of both tables be integers, same thing--you get bad estimates > when the join condition refers to one column and the where refers to the > other. I don't know why the estimate is poor, but it is not related to the > types of the columns, but rather the identities of them. > > I suspect it has to with the lack of a NOT NULL constraint on either column causing the planner to disregard the potential to implement a LEFT JOIN using ANTI-JOIN semantics - or, also possible - the form itself is invalid regardless of the presence or absence of contraints. IIUC, while a true anti-join syntax doesn't exist the canonical form for one uses NOT EXISTS - which would force the author to use only the correct column pair. David J.