"Scott Marlowe" <[EMAIL PROTECTED]> writes: > PostgreSQL has a "generic" aggregate method. Imagine instead doing a > select count(id1+id2-id3) from table where ... In that instance, it's > not a simple shortcut to just grab the number of rows anymore. Since > PostgreSQL uses a generic aggregate method that can be expanded by the > user with custom aggregates et. al., it has no optimizations to make > simple count(*) fast, like many other databases.
People expect count(*) _without a where clause_ to be cached in a single global variable. Postgres can't do this, but the reason has everything to do with MVCC, not with postgres's generalized aggregates. Because of MVCC Postgres can't just store a single cached value, because there is no single cached value. It would have to store a complete history back to the oldest extant transaction. However in this case the user has a where clause. No database is going to cache values of count(*) for random where clauses. But that doesn't stop Postgres from using an index to fetch the records. > > > select somefield from sometable where timestampfield > now()-'60 > > > seconds'::interval > > > > > > and count the number of returned rows. If there's a lot, it won't be > > > any faster, if there's a few, it should be a win. > > > > Why would this ever be faster? And how could postgres ever calculate that > > without doing a sequential scan when count(*) would force it to do a > > sequential scan? > > Because, count(*) CANNOT use an index. So, if you're hitting, say, > 0.01% of the table (let's say 20 out of 20,000,000 rows or something > like that) then the second should be MUCH faster. I think you've applied these past discussions and come up with some bogus conclusions. The problem here has nothing to do with the count(*) and everything to do with the WHERE clause. To fetch the records satisfying that where clause postgres has to do exactly the same thing regardless of whether it's going to feed the data to count(*) or return some or all of it to the client. If postgres decides the where clause isn't selective enough it'll choose to use a sequential scan. However it would do that regardless of whether you're calling count(*) or not. If the number is records is substantial then you would get the overhead of the scan plus the time it takes to transfer all that unnecessary data to the user. What you're probably thinking of when you talk about general purpose aggregate interfaces is the difficulty of making min()/max() use indexes. That's a whole other case entirely. That's where postgres's generalized aggregates leaves it without enough information about what records the aggregate functions are interested in and what index scans might make them faster. None of these common cases end up making it a good idea to read the records into the clients and do the work in the client. The only cases where that would make sense would be if the function requires doing some manipulation of the data that's awkward to express in sql. The "top n" type of query is the usual culprit, but with postgres's new array functions even that becomes tractable. -- greg ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html