Re: POC: GROUP BY optimization

Teodor Sigaev Thu, 07 Jun 2018 06:41:59 -0700

So the costing was fairly trivial, we simply do something like


     comparison_cost = 2.0 * cpu_operator_cost;

     sort_cost = comparison_cost * tuples * LOG2(tuples);

which essentially ignores that there might be multiple columns, or that
the columns may have sort operator with different costs.

Agree. And distribution of keys.


The question is how reliable the heuristics can be. The current patch
uses just plain ndistinct, but that seems rather unreliable but I don't
have a clear idea how to improve that - we may have MCV for the columns
and perhaps some extended statistics, but I'm not sure how far we can
run with that.

v8 already uses another algorithm.


Essentially what we need to estimate the number of comparisons for each
column, to compute better comparison_cost.

Exactly

Priorization of the user-provided order can be as simple as giving
that comparison_cost a small handicap.


I see no point in doing that, and I don't recall a single place in the
planner where we do that. If the user specified ORDER BY, we'll slap an
explicit Sort on top when needed (which acts as the handicap, but in a
clear way). Otherwise we don't do such things - it'd be just plain
confusing (consider "ORDER BY a,b" vs. "ORDER BY b,c" with same data
types, ndistinct etc. but unexpectedly different costs). Also, what
would be a good value for the handicap?

Again agree. If we have fixed order of columns (ORDER BY) then we should not tryto reorder it. Current patch follows that if I didn't a mistake.


--
Teodor Sigaev                                   E-mail: teo...@sigaev.ru
                                                   WWW: http://www.sigaev.ru/

Re: POC: GROUP BY optimization

Reply via email to