Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Justin Pryzby Sun, 21 Feb 2021 17:21:52 -0800

On Tue, Feb 16, 2021 at 11:15:51PM +1300, David Rowley wrote:
> To summarise here, the planner performance gets a fair bit worse with
> the patched code. With master, summing the average planning time over
> each of the queries resulted in a total planning time of 765.7 ms.
> After patching, that went up to 1097.5 ms.  I was pretty disappointed
> about that.


I have a couple ideas;

 - default enable_resultcache=off seems okay.  In plenty of cases, planning
   time is unimportant.  This is the "low bar" - if we can do better and enable
   it by default, that's great.

 - Maybe this should be integrated into nestloop rather than being a separate
   plan node.  That means that it could be dynamically enabled during
   execution, maybe after a few loops or after checking that there's at least
   some minimal number of repeated keys and cache hits.  cost_nestloop would
   consider whether to use a result cache or not, and explain would show the
   cache stats as a part of nested loop.  In this case, I propose there'd still
   be a GUC to disable it.

 - Maybe cost_resultcache() can be split into initial_cost and final_cost
   parts, same as for nestloop ?  I'm not sure how it'd work, since
   initial_cost is supposed to return a lower bound, and resultcache tries to
   make things cheaper.  initial_cost would just add some operator/tuple costs
   to make sure that resultcache of a unique scan is more expensive than
   nestloop alone.  estimate_num_groups is at least O(n) WRT
   rcpath->param_exprs, so maybe you charge 100*list_length(param_exprs) *
   cpu_operator_cost in initial_cost and then call estimate_num_groups in
   final_cost.  We'd be estimating the cost of estimating the cost...

 - Maybe an initial implementation of this would only add a result cache if the
   best plan was already going to use a nested loop, even though a cached
   nested loop might be cheaper than other plans.  This would avoid most
   planner costs, and give improved performance at execution time, but leaves
   something "on the table" for the future.

> +cost_resultcache_rescan(PlannerInfo *root, ResultCachePath *rcpath,
> +                     Cost *rescan_startup_cost, Cost *rescan_total_cost)
> +{
> +     double          tuples = rcpath->subpath->rows;
> +     double          calls = rcpath->calls;
...
> +     /* estimate on the distinct number of parameter values */
> +     ndistinct = estimate_num_groups(root, rcpath->param_exprs, calls, NULL,
> +                                     &estinfo);

Shouldn't this pass "tuples" and not "calls" ?

-- 
Justin

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Reply via email to