Re: [HACKERS] Multi-pass planner

Greg Stark Thu, 20 Aug 2009 10:29:24 -0700

On Thu, Aug 20, 2009 at 6:10 PM, Robert Haas<[email protected]> wrote:
> Maybe.  The problem is that we have mostly two cases: an estimate that
> we think is pretty good based on reasonable statistics (but may be way
> off if there are hidden correlations we don't know about), and a wild
> guess.  Also, it doesn't tend to matter very much when the estimates
> are off by, say, a factor of two.  The real problem is when they are
> off by an order of magnitude or more.


One problem is that you can't just take a range of row estimates and
calculate a cost for both endpoints of the estimate range to get a
cost estimate. It's quite possible for more rows to generate a lower
cost (think if you have a NOT IN query).

Another problem is that it's not really helpful to have a range of
costs unless you can actually make use of them to make decisions. The
planner doesn't come up with multiple independent complete plans and
then pick the one with the cheapest cost. It has to make some
decisions along the way to avoid exponential growth. Those decisions
might have a tightly constrained cost but cause higher nodes to have
very wide cost ranges (think of deciding not to materialize something
which later gets put on the outer side of a nested loop). But there's
no way to know at the time that they'll be critical to avoiding that
risky plan later.

I don't think it's a bad idea, I just think you have to set your
expectations pretty low. If the estimates are bad there isn't really
any plan that will be guaranteed to run quickly.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi-pass planner

Reply via email to