I have a ~1 million row table that I mostly want to query by date range. The rows are pretty uniformly spread over a 3 year date range. I have an index on the date column, but it wasn't always used in the past. I disabled the seqscan plan before running my query as a first fix, but it bothered me that I had to do that.
Next, thanks to my earlier thread, I clustered the table on the date column and then "SET STATISTICS" on the date column to be 100. That did the trick, and I stopped explicitly disabling seqscan.
Today, I noticed that Postgres (still 7.4) stopped using the date index again. I checked the correlation for the date column and it was down to 0.4. So I guess that stat does drift away from 1.0 after clustering. That's a bummer, because clustering locks up the table while it works, which I can't really afford to do often. Even at a correlation of 0.4 on the date column, using the date index was still much faster than the seqscan plan that Postgres was choosing. Anyway, it's reclustering now.
A common query looks like this:
SELECT SUM(amount), SUM(quantity), date_trunc('day', date) AS date FROM mytable WHERE col1 IS NOT NULL AND col2 = 'foo' AND col3 = 'bar' AND date BETWEEN '2004-02-01 00:00:00' AND '2004-02-28 23:59:59' GROUP BY date_trunc('day', date) ORDER BY date;
The EXPLAIN ANALYZE output should look like this:
Sort (cost=4781.75..4824.15 rows=16963 width=23) (actual time=2243.595..2243.619 rows=21 loops=1)
Sort Key: date_trunc('day'::text, date)
-> HashAggregate (cost=3462.87..3590.09 rows=16963 width=23) (actual time=2241.773..2243.454 rows=21 loops=1)
-> Index Scan using mytable_date_idx on mytable (cost=0.00..3071.70 rows=52155 width=23) (actual time=2.610..1688.111 rows=49679 loops=1)
Index Cond: ((date >= '2004-02-01 00:00:00'::timestamp without time zone) AND (date <= '2004-02-28 23:59:59'::timestamp without time zone))
Filter: ((col1 IS NOT NULL) AND ((col2)::text = 'foo'::text) AND ((col3)::text = 'bar'::text))
Total runtime: 2244.391 ms
Unfortunately, since I just re-clustered, I can't get the old EXPLAIN output, but just imagine "Seq Scan" in place of "Index Scan using mytable_date_idx" to get the idea.
My question is: what other options do I have? Should I "SET STATISTICS" on the date column to 200? 500? The maximum value of 1000? I want to do something that will convince Postgres that using the date index is, by far, the best plan when running my queries, even when the date column correlation stat drops well below 1.0.
---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly