Hi Folks, I was hoping someone could help me to improve the performance of a 
query I've got that insists on doing a seq. scan on a large table.  I'm trying 
to do some reporting based on my spam logs which I've partly digested and 
stored in a table.  Here are the particulars:

The messages table:

      Column       |         Type          |  Modifiers 
-------------------+-----------------------+--------------
 message_id        | integer               | not null default 
nextval('spamreport.messages_message_id_seq'::text)
 received_date     | date                  | not null
 score             | numeric               | not null
 user_threshold    | numeric               | not null
 raw_email_address | character varying(64) | not null
 processing_time   | numeric               | not null
 size              | integer               | not null
 fuzzed_address    | character varying(64) | not null
 domain            | character varying(64) | not null
Indexes:
    "messages_pkey" primary key, btree (message_id)
    "domain_idx" btree ("domain")
    "fuzzy_idx" btree (fuzzed_address)
    "received_date_idx" btree (received_date)

And here's the primary query I run, along with explain analyze output:

>> explain analyze SELECT * FROM ( SELECT
        domain,
        count(*) as count,
        max(score) as max_score,
        avg(score) as average_score,
        stddev(score) as stddev_score,
        sum(CASE WHEN score > user_threshold THEN 1 ELSE 0 END) as spam_count,
        avg(processing_time) as average_time,
        avg(size) as average_size
      FROM messages
      WHERE received_date BETWEEN '2004-11-01' AND '2004-11-30'
        GROUP BY domain ) AS aggs
        ORDER BY count DESC;

   QUERY PLAN                                                              
-------------------------------------------------------------------
 Sort  (cost=30303.51..30303.60 rows=35 width=226) (actual 
time=29869.716..29869.883 rows=69 loops=1)
   Sort Key: count
   ->  Subquery Scan aggs  (cost=30301.56..30302.61 rows=35 width=226) (actual 
time=29861.705..29869.240 rows=69 loops=1)
         ->  HashAggregate  (cost=30301.56..30302.26 rows=35 width=54) (actual 
time=29861.681..29868.261 rows=69 loops=1)
               ->  Seq Scan on messages  (cost=0.00..21573.04 rows=436426 
width=54) (actual time=5.523..6304.657 rows=462931 loops=1)
                     Filter: ((received_date >= '2004-11-01'::date) AND 
(received_date <= '2004-11-30'::date))
 Total runtime: 29870.437 ms

This database gets vacuumed nightly.  Also, the query plan stays the same even 
if I restrict the received_date column down to a single day.

Thanks in advance,
-- 
Dave Steinberg
http://www.geekisp.com/

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Reply via email to