Re: Hybrid Hash/Nested Loop joins and caching results from subplans

David Rowley Thu, 21 May 2020 16:54:53 -0700

On Thu, 21 May 2020 at 00:56, Simon Riggs <si...@2ndquadrant.com> wrote:
> I thought the main reason to do this was the case when the nested loop 
> subplan was significantly underestimated and we realize during execution that 
> we should have built a hash table. So including this based on cost alone 
> seems to miss a trick.


Isn't that mostly because the planner tends to choose a
non-parameterized nested loop when it thinks the outer side of the
join has just 1 row?  If so, I'd say that's a separate problem as
Result Cache only deals with parameterized nested loops.  Perhaps the
problem you mention could be fixed by adding some "uncertainty degree"
to the selectivity estimate function and have it return that along
with the selectivity.  We'd likely not want to choose an
unparameterized nested loop when the uncertainly level is high.
Multiplying the selectivity of different selectivity estimates could
raise the uncertainty level a magnitude.

For plans where the planner chooses to use a non-parameterized nested
loop due to having just 1 row on the outer side of the loop, it's
taking a huge risk. The cost of putting the 1 row on the inner side of
a hash join would bearly cost anything extra during execution.
Hashing 1 row is pretty cheap and performing a lookup on that hashed
row is not much more expensive than evaluating the qual of the nested
loop. Really just requires the additional hash function calls.  Having
the uncertainty degree I mentioned above would allow us to only have
the planner do that when the uncertainty degree indicates it's not
worth the risk.

David

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Reply via email to