On Tue, Jun 15, 2021 at 12:31 PM Robert Haas <robertmh...@gmail.com> wrote: > Yes, I think it is. Reading the paper really helped me crystallize my > thoughts about this, because when I've studied the problems myself, I > came, as you postulate here, to the conclusion that there's a lot of > stuff the planner does where there is risk and uncertainty, and thus > that a general framework would be necessary to deal with it.
It is an example (perhaps the only example in the optimizer) of an oasis of certainty in an ocean of uncertainty. As uncertain as everything is, we seemingly can make strong robust statements about the relative merits of each strategy *in general*, just in this particular instance. It's just not reasonable to make such a reckless choice, no matter what your general risk tolerance is. Goetz Graefe is interviewed here, and goes into his philosophy on robustness -- it seems really interesting to me: https://sigmodrecord.org/publications/sigmodRecord/2009/pdfs/05_Profiles_Graefe.pdf > In defense of that approach, note that this is a > case where we know both that the Nested Loop is risky and that Hash > Join is a similar alternative with probably similar cost. I am not > sure there are any other cases where we can say quite so generally > both that a certain thing is risky and what we could do instead. I tend to think of a hash join as like a nested loop join with an inner index scan where you build the index yourself, dynamically. That might be why I find it easy to make this mental leap. In theory you could do this by giving the nestloop join runtime smarts -- make it turn into a hash join adaptively. Like Graefe's G-Join design. That way you could do this in a theoretically pure way. I don't think that that's actually necessary just to deal with this case -- it probably really is as simple as it seems. I point this out because perhaps it's useful to have that theoretical anchoring. -- Peter Geoghegan