On Wed, Sep 27, 2017 at 5:45 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Marko Tiikkaja <ma...@joh.to> writes: > > I wonder if the nested loop shouldn't have some kind of a cap on its own > > estimate if it's wildly off of what you'd get by multiplying the child > > nodes' estimates with each other? > > Nonstarter I'm afraid. The join relation's size estimate is determined > long before we get to a point where we could multiply the sizes of these > particular child paths to arrive at the conclusion that it should be > something different than what we decided originally. Ah hah. Thanks for the explanation, that makes sense. > Adjusting the size > of the nestloop result at that point would merely give it an unfair > advantage over other paths for the same join relation. (I think it would > also break some assumptions about paths for the same relation all giving > the same number of rows, unless parameterized.) > With the previous paragraph in mind, I would agree; it's not a very good idea. > Looking at it another way, the main thing that the combination of hashagg > outer path + indexscan inner path knows that eqjoinsel_semi didn't account > for is that there's a unique index on foo.id. But that info is available > to eqjoinsel_semi, in the sense that it's been given a nondefault estimate > that nd1 is equal to the outer relation size. So the mistake that it's > making is to throw up its hands and use an 0.5 selectivity estimate just > because it has no info about the inner relation. I think if we'd pushed > through the nd2/nd1 calculation after setting nd2 = size of inner rel, > we'd end up with an estimate matching the product of these path sizes. > (Caution: inadequate caffeine absorbed yet, this might be all wrong.) > This sounds very reasonable to me. .m