Bob Henkel <[EMAIL PROTECTED]> writes: > From a simple/high level perspective why is this? That is why can't > PostgreSQL do aggregates as well across large chunks of data. I'm > assuming it extremely complicated. Otherwise the folks around here > would have churned out a fix in a month or less and made this issue a > past story.
You can find very detailed discussions of this in the archives, but the basic reason is that we have a very general/extensible view of aggregates (which is how come we can support custom aggregates). An aggregate is a function that you feed all the input rows to, one at a time, and then it produces the answer. Nice, general, extensible, and not at all optimizable :-( Now in general that is the only way to do it, and so Scott's implication that we always suck compared to other databases is really an overstatement. Ask another database to do a standard deviation calculation, for instance, and it'll be just as slow. However there are special cases that other DBs can optimize that we don't even try to. The big ones are: * COUNT(*) across a whole table --- most non-MVCC databases keep tabs of the physical number of the rows in the table, and so they can answer this very quickly. Postgres doesn't keep such a count, and under MVCC rules it wouldn't necessarily be the right answer if we had it. (BTW, count of rows satisfying a particular condition is a different ballgame entirely; in most cases that can't be optimized at all, AFAIK.) If you are willing to accept approximate answers there are various tricks you can use --- see the archives --- but we don't get to fudge on COUNT(*) itself because it's in the SQL standard. * MIN or MAX of an indexed column --- most DBs can use an index scan to find such a row relatively quickly, although whether this trick works or not depends a whole lot on whether you have WHERE or GROUP BY and just what those conditions look like. You can fake the min/max answer in Postgres by doing the transformstion to an indexable query by hand, for instance instead of MAX(col) do SELECT col FROM tab ORDER BY col DESC LIMIT 1; There are periodic discussions in the hackers list about teaching the planner to do that automatically, and it will probably happen someday; but it's a complicated task and not exceedingly high on the priority list. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings