On Mon, Dec 09, 2019 at 05:27:01PM -0500, Greg Stark wrote:
On Mon, 9 Dec 2019 at 17:14, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote:

On Sat, Dec 07, 2019 at 11:34:12AM -0500, Tom Lane wrote:
>Justin Pryzby <pry...@telsasoft.com> writes:
>> Jeff said:
>>> |What would I find very useful is a verbosity option to get the cost
>>> |estimates expressed as a multiplier of each *_cost parameter, rather than
>>> |just as a scalar.
>
>> It seems to me that's "just" a matter of redefining Cost and fixing 
everything that breaks:
>
>> struct Cost {
>>         double seq, rand;
>>         double cpu_tuple, cpu_index_tuple, cpu_oper;
>>         double parallel_setup; // This is probably always in startup_cost 
and never in run_cost
>>      double parallel_tuple; // This is probably always in run_cost and never 
in startup_cost
>>         double disable;
>> };
>
>> I'm perhaps 50% done with that - is there some agreement that's a desirable
>> goal and a good way to do it ?
>
>No, I think this will get rejected out of hand.  The implications for
>the planner's speed and memory consumption seem quite unacceptable
>for the size of the benefit.  What you're showing above probably
>doubles the size of most Paths, and the added cycles in hot-spots
>like add_path seem pretty daunting.
>

Yeah, that's an issue. But I have to admit my main issue with this
proposal is that I have no idea how I'd interpret this Cost. I mean,
what do the fields express for different types of paths? How do they
contribute to the actual cost of that path?

What I think users would be able to do with this info is understand
which parameter to tweak to raise the estimated cost of the node.

Everyone knows if you see a index scan is being used but is taking
longer than a sequential scan then you might try raising
random_page_cost. But I rarely see people tweaking the more "exotic"
parameters like operator_tuple_cost or index_tuple_cost and when they
do they aren't really sure what nodes they're affecting...


Well, but that's kinda my point - how would you know that you need to
increase random_page_cost, or how big influence it has? The total is a
fairly non-trivial combination of various cost parameters, effective
cache size etc. Maybe I just don't understand how the cost is split into
those pieces, named the same as the cost GUCs ...

I remember planning to do a very similar thing back in the 8.3 era and
never getting around to it. You could imaging even storing these for
the overall plan in the logs and building a large matrix of actual
execution values versus these broken out individual costs. Then it
becomes a standard linear optimization problem to find the optimal
values for each parameter to minimize inaccurate plan estimates (and
to identify cases where there are outliers).


Maybe, but that's for one query. If you do this for many queries, the
results may be easily contradicting, no?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to