On Thu, 21 May 2020 at 00:56, Simon Riggs <si...@2ndquadrant.com> wrote: > I thought the main reason to do this was the case when the nested loop > subplan was significantly underestimated and we realize during execution that > we should have built a hash table. So including this based on cost alone > seems to miss a trick.
Isn't that mostly because the planner tends to choose a non-parameterized nested loop when it thinks the outer side of the join has just 1 row? If so, I'd say that's a separate problem as Result Cache only deals with parameterized nested loops. Perhaps the problem you mention could be fixed by adding some "uncertainty degree" to the selectivity estimate function and have it return that along with the selectivity. We'd likely not want to choose an unparameterized nested loop when the uncertainly level is high. Multiplying the selectivity of different selectivity estimates could raise the uncertainty level a magnitude. For plans where the planner chooses to use a non-parameterized nested loop due to having just 1 row on the outer side of the loop, it's taking a huge risk. The cost of putting the 1 row on the inner side of a hash join would bearly cost anything extra during execution. Hashing 1 row is pretty cheap and performing a lookup on that hashed row is not much more expensive than evaluating the qual of the nested loop. Really just requires the additional hash function calls. Having the uncertainty degree I mentioned above would allow us to only have the planner do that when the uncertainty degree indicates it's not worth the risk. David