On Mon, Aug 04, 2008 at 03:09:34PM -0400, Greg Smith wrote:
> On Mon, 4 Aug 2008, daveg wrote:
> >We load the production dumps into our dev environment, which are the same
> >hardware spec, so the costs should be identical.
> 
> Not identical, just close.  ANALYZE samples data from your table randomly. 
> The statistics used to compute the costs will therefore be slightly 
> different on the two servers even if the data is the same.  The problem of 
> discovering one plan on production and another on development is not quite 
> that easy to remove.  Ultimately, if your developers aren't thorough 
> enough to do thinks like look at EXPLAIN plans enough to discover things 
> that are just bad, I just chuckle at your thinking that putting a single 
> limiter on their bad behavior will somehow magically make that better.

Not all developers can be persuaded to run explain on every change.
However, many will investigate a new message. I'm only hoping to try to
focus their attention toward possible problem queries.

> Anyway, if your production server is small enough that you can afford to 
> have another one just like it for the developers to work on, that's great. 
> Robert's point is that many installs don't work like that.  The 
> development teams in lots of places only get a subset of the production 
> data because it's too large to deploy on anything but a big server, which 
> often is hard to cost justify buying just for development purposes.

Not to get into a size war ;-), but the production environment I'd like this
feature for is over 40 32GB 16 scsi drive quadcore boxes. These are dedicated
to postgresql and run one or just a few databases. There are also a bunch
of client boxes that we will not speak of. The staging and test environments
are similar hardware but have only a subset of the databases copied to them.
There are probably than a dozen DB hosts for that.

> I like the concept of a cost limit, but I'm a bit horrified by the thought 
> of it being exposed simply through the internal cost numbers because they 
> are so arbitrary.  One of the endless projects I think about but never 


> start coding is to write something that measures the things the planner 
> cost constants estimate on a particular machine, so that all those numbers 
> actually can be tied to some real-world time measure.  If you did that, 
> you'd actually have a shot at accomplishing the real goal here, making 
> statement_cost_limit cut off statements expected to take longer than 
> statement_timeout before they even get started.

That is a nice idea. Possibly it could be a utility like the fsync tester.

But planner estimates are never going to be all that accurate even with solid
cost numbers because for some classes of queries, particularly those with
many joins the stats can be good at each level but the error accumulates
exponentially. Which is why I think a warning is appropriate instead of an
error. Even a notice in the logs would be useful.

-dg

-- 
David Gould       [EMAIL PROTECTED]      510 536 1443    510 282 0869
If simplicity worked, the world would be overrun with insects.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to