On Thu, Aug 20, 2009 at 6:10 PM, Robert Haas<robertmh...@gmail.com> wrote: > Maybe. The problem is that we have mostly two cases: an estimate that > we think is pretty good based on reasonable statistics (but may be way > off if there are hidden correlations we don't know about), and a wild > guess. Also, it doesn't tend to matter very much when the estimates > are off by, say, a factor of two. The real problem is when they are > off by an order of magnitude or more.
One problem is that you can't just take a range of row estimates and calculate a cost for both endpoints of the estimate range to get a cost estimate. It's quite possible for more rows to generate a lower cost (think if you have a NOT IN query). Another problem is that it's not really helpful to have a range of costs unless you can actually make use of them to make decisions. The planner doesn't come up with multiple independent complete plans and then pick the one with the cheapest cost. It has to make some decisions along the way to avoid exponential growth. Those decisions might have a tightly constrained cost but cause higher nodes to have very wide cost ranges (think of deciding not to materialize something which later gets put on the outer side of a nested loop). But there's no way to know at the time that they'll be critical to avoiding that risky plan later. I don't think it's a bad idea, I just think you have to set your expectations pretty low. If the estimates are bad there isn't really any plan that will be guaranteed to run quickly. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers