On Wed, Sep 27, 2017 at 5:45 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Marko Tiikkaja <ma...@joh.to> writes:
> > I wonder if the nested loop shouldn't have some kind of a cap on its own
> > estimate if it's wildly off of what you'd get by multiplying the child
> > nodes' estimates with each other?
>
> Nonstarter I'm afraid.  The join relation's size estimate is determined
> long before we get to a point where we could multiply the sizes of these
> particular child paths to arrive at the conclusion that it should be
> something different than what we decided originally.


Ah hah.  Thanks for the explanation, that makes sense.


> Adjusting the size
> of the nestloop result at that point would merely give it an unfair
> advantage over other paths for the same join relation.  (I think it would
> also break some assumptions about paths for the same relation all giving
> the same number of rows, unless parameterized.)
>

With the previous paragraph in mind, I would agree; it's not a very good
idea.


> Looking at it another way, the main thing that the combination of hashagg
> outer path + indexscan inner path knows that eqjoinsel_semi didn't account
> for is that there's a unique index on foo.id.  But that info is available
> to eqjoinsel_semi, in the sense that it's been given a nondefault estimate
> that nd1 is equal to the outer relation size.  So the mistake that it's
> making is to throw up its hands and use an 0.5 selectivity estimate just
> because it has no info about the inner relation.  I think if we'd pushed
> through the nd2/nd1 calculation after setting nd2 = size of inner rel,
> we'd end up with an estimate matching the product of these path sizes.
> (Caution: inadequate caffeine absorbed yet, this might be all wrong.)
>

This sounds very reasonable to me.


.m

Reply via email to