Hi,
I think there was some discussion about seq scans messing up the cache, and talk about doing something about it but I don't think it has been addressed yet. Maybe worth a troll through the archives.
It is certainly true that in many situations, a seq scan is preferable to using an index. I have been testing a situation here on two versions of the same database, one of the databases is much bigger than the other (artificially bloated for testing purposes). Some of the query plans change to use seq scans on the big database, where they used indexes on the little database - but either way, in *single user* testing the performance is fine. My concern is that this kind of testing has very little relevance to the real world of multiuser processing where contention for the cache becomes an issue. It may be that, at least in the current situation, postgres is giving too much weight to seq scans based on single user, straight line performance comparisons. If your assumption is correct, then addressing that might help, though it's bound to have it's compromises too...
regards Iain
----- Original Message ----- From: "Mark Aufflick" <[EMAIL PROTECTED]>
To: <pgsql-performance@postgresql.org>
Sent: Tuesday, February 15, 2005 8:34 AM
Subject: [PERFORM] seq scan cache vs. index cache smackdown
Hi All,
I have boiled my situation down to the following simple case: (postgres version 7.3)
* Query 1 is doing a sequential scan over a table (courtesy of field ILIKE 'foo%') and index joins to a few others
* Query 2 is doing a functional index scan over the same table (lower(field) LIKE 'foo%') and index joins to a few others
* neither query has an order by clause
* for the purpose of testing, both queries are designed to return the same result set
Obviously Q2 is faster than Q1, but if I ever run them both at the same time (lets say I run two of Q1 and one of Q2 at the same time) then Q2 consistently returns WORSE times than Q1 (explain analyze confirms that it is using the index).
My assumption is that the sequential scan is blowing the index from any cache it might live in, and simultaneously stealing all the disk IO that is needed to access the index on disk (the table has 200,000 rows).
If I simplify the case to not do the index joins (ie. operate on the one table only) the situation is not as dramatic, but similar.
My thoughts are:
1) kill the sequential scan - but unfortunately I don't have direct control over that code
2) change the way the server allocates/prioritizes different caches - i don't know enough about how postgres caches work to do this (if it's possible)
3) try it on postgres 7.4 - possible, but migrating the system to 7.4 in production will be hard because the above code that I am not responsible for has a lot of (slightly wacky) implicit date casts
4) ask the fine people on the mailing list for other suggestions!
--
Mark Aufflick
e [EMAIL PROTECTED]
w www.pumptheory.com (work)
w mark.aufflick.com (personal)
p +61 438 700 647
f +61 2 9436 4737
======================================================================== iBurst Wireless Broadband from $34.95/month www.platformnetworks.net Forward undetected SPAM to: [EMAIL PROTECTED] ========================================================================
---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings
---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend