Hi Folks, I was hoping someone could help me to improve the performance of a query I've got that insists on doing a seq. scan on a large table. I'm trying to do some reporting based on my spam logs which I've partly digested and stored in a table. Here are the particulars:
The messages table: Column | Type | Modifiers -------------------+-----------------------+-------------- message_id | integer | not null default nextval('spamreport.messages_message_id_seq'::text) received_date | date | not null score | numeric | not null user_threshold | numeric | not null raw_email_address | character varying(64) | not null processing_time | numeric | not null size | integer | not null fuzzed_address | character varying(64) | not null domain | character varying(64) | not null Indexes: "messages_pkey" primary key, btree (message_id) "domain_idx" btree ("domain") "fuzzy_idx" btree (fuzzed_address) "received_date_idx" btree (received_date) And here's the primary query I run, along with explain analyze output: >> explain analyze SELECT * FROM ( SELECT domain, count(*) as count, max(score) as max_score, avg(score) as average_score, stddev(score) as stddev_score, sum(CASE WHEN score > user_threshold THEN 1 ELSE 0 END) as spam_count, avg(processing_time) as average_time, avg(size) as average_size FROM messages WHERE received_date BETWEEN '2004-11-01' AND '2004-11-30' GROUP BY domain ) AS aggs ORDER BY count DESC; QUERY PLAN ------------------------------------------------------------------- Sort (cost=30303.51..30303.60 rows=35 width=226) (actual time=29869.716..29869.883 rows=69 loops=1) Sort Key: count -> Subquery Scan aggs (cost=30301.56..30302.61 rows=35 width=226) (actual time=29861.705..29869.240 rows=69 loops=1) -> HashAggregate (cost=30301.56..30302.26 rows=35 width=54) (actual time=29861.681..29868.261 rows=69 loops=1) -> Seq Scan on messages (cost=0.00..21573.04 rows=436426 width=54) (actual time=5.523..6304.657 rows=462931 loops=1) Filter: ((received_date >= '2004-11-01'::date) AND (received_date <= '2004-11-30'::date)) Total runtime: 29870.437 ms This database gets vacuumed nightly. Also, the query plan stays the same even if I restrict the received_date column down to a single day. Thanks in advance, -- Dave Steinberg http://www.geekisp.com/ ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]