I have a very large table (about a million rows) which I most frequently
want to select a subset of rows from base on a date field. That date field
is indexed, and when Postgres uses that index, queries are fast. But
sometimes it decides not to use the index, resorting to a sequential scan
instead. This is really, really slow.
To try to convince it to use my date index, I turned off the sequential scan
strategy in the planner. That worked on one copy of the db, but not on
another, where it decided to use an index from another column entirely,
which didn't help performance. I dropped the other index, leaving only the
date index, and performance was good again.
Obviously the planner is making some bad choices here. I know that it is
trying to avoid random seeks or other scary things implied by a
"correlation" statistic that is not close to 1 or -1, but it is massively
overestimating the hit caused by those seeks and seemingly not taking into
account the size of the table! This is Postgres 7.4 on Linux and Mac OS X,
Anyway, to "fix" the situation, I clustered the table on the date column.
But I fear that the data will slowly "drift" back to a state where the
planner decides again that a sequential scan is a good idea. Blah.
So, my question: how can I prevent this? Ideally, the planner should be
smarter. Failing that, I'd like to be able to force it to use the index
that I know will result in the fastest queries (3 seconds vs. 30 seconds in
many cases). Suggestions?
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster