Re: [HACKERS] Linear vs. nonlinear planner cost estimates

Robert Haas Fri, 16 Dec 2016 08:00:03 -0800

On Wed, Dec 14, 2016 at 3:46 PM, Tom Lane <[email protected]> wrote:
> The really *critical* aspect of this, which could cost far more than a
> few extra executions of a costing function, comes in if it damages
> add_path's ability to prune inferior paths early.  So we would not want
> to simply drop the startup_cost field; and we certainly don't want
> add_path calling this function during every path comparison, either.


Agreed.

> So my thought is that we'd need to keep startup_cost for add_path to
> use, though we could (and likely should) redefine it as below.  We
> should still be able to assume that if path A dominates path B at
> first-tuple and last-tuple costs, it dominates everywhere in between.

In theory, there's no reason that has to be true.   B could start out
a little more expensive than A, and then grow very slowly thereafter
until suddenly becoming super-expensive for the very last tuple, so
that for limits more than 1 but less than the total number of tuples
produced B actually wins.  This actually isn't a totally unrealistic
case: consider an index scan with a filter condition where all of the
tuples that pass the filter are clustered near the beginning.  Getting
any number of tuples that are actually present is cheap, but the very
last fetch that fails to find any more tuples is crushingly expensive.

That having been said, I doubt it's realistic to make the cost model
good enough to hope we'd get such cases correct, so making the
assumption that you propose is probably the right idea anyway.

> A thought that occurred to me after more study of the problem example
> is that the "good" index is a bit bigger than the "bad" one, meaning
> it will get charged a bit more in index descent costs.  With our current
> definition of startup_cost as being only the index descent cost, that
> means the good index looks worse on startup_cost than the bad one.  In
> this example, the good index should survive the add_path tournament
> anyway because it has better total_cost --- but it's easy to imagine
> cases where that wouldn't be true.  So I'm thinking that redefining
> startup_cost as time to fetch the first tuple would be a good thing
> in terms of making sure that desirable plans don't lose out in add_path.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Linear vs. nonlinear planner cost estimates

Reply via email to