here's the URL:
http://techdocs.postgresql.org/techdocs/pgsqladventuresep2.php

Patrick Hatcher
Macys.Com
Legacy Integration Developer
415-422-1610 office
HatcherPT - AIM


                                                                           
           Patrick                                                         
           Hatcher/MCOM/FDD                                                
                                                                        To 
           11/10/2003 12:31 PM        "Marc G. Fournier"                   
                                      <[EMAIL PROTECTED]>@FDS-NOTES   
                                                                        cc 
                                      [EMAIL PROTECTED],    
                                      [EMAIL PROTECTED] 
                                      rg                                   
                                                                   Subject 
                                      Re: [PERFORM] *very* slow query to   
                                      summarize data for a month ...       
                                      (Document link: Patrick Hatcher)     
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           



Do you have an index on ts.bytes?  Josh had suggested this and after I put
it on my summed fields, I saw a speed increase.  I can't remember the
article was that Josh had written about index usage, but maybe he'll chime
in and supply the URL for his article.
hth

Patrick Hatcher



                                                                           
           "Marc G. Fournier"                                              
           <[EMAIL PROTECTED]                                             
           .org>                                                        To 
           Sent by:                   [EMAIL PROTECTED]     
           pgsql-performance-o                                          cc 
           [EMAIL PROTECTED]                                             
                                                                   Subject 
                                      [PERFORM] *very* slow query to       
           11/10/2003 12:18 PM        summarize data for a month ...       
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           





Table structure is simple:

CREATE TABLE traffic_logs (
    company_id bigint,
    ip_id bigint,
    port integer,
    bytes bigint,
    runtime timestamp without time zone
);

runtime is 'day of month' ...

I need to summarize the month, per company, with a query as:

explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS
total_traffic
    FROM company c, traffic_logs ts
   WHERE c.company_id = ts.company_id
     AND month_trunc(ts.runtime) = '2003-10-01'
GROUP BY company_name,ts.company_id;

and the explain looks like:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=32000.94..32083.07 rows=821 width=41) (actual
time=32983.36..47586.17 rows=144 loops=1)
   ->  Group  (cost=32000.94..32062.54 rows=8213 width=41) (actual
time=32957.40..42817.88 rows=462198 loops=1)
         ->  Sort  (cost=32000.94..32021.47 rows=8213 width=41) (actual
time=32957.38..36261.31 rows=462198 loops=1)
               Sort Key: c.company_name, ts.company_id
               ->  Merge Join  (cost=31321.45..31466.92 rows=8213 width=41)
(actual time=13983.07..22642.14 rows=462198 loops=1)
                     Merge Cond: ("outer".company_id = "inner".company_id)
                     ->  Sort  (cost=24.41..25.29 rows=352 width=25)
(actual time=5.52..7.40 rows=348 loops=1)
                           Sort Key: c.company_id
                           ->  Seq Scan on company c  (cost=0.00..9.52
rows=352 width=25) (actual time=0.02..2.78 rows=352 loops=1)
                     ->  Sort  (cost=31297.04..31317.57 rows=8213 width=16)
(actual time=13977.49..16794.41 rows=462198 loops=1)
                           Sort Key: ts.company_id
                           ->  Index Scan using tl_month on traffic_logs ts
(cost=0.00..30763.02 rows=8213 width=16) (actual time=0.29..5562.25
rows=462198 loops=1)
                                 Index Cond: (month_trunc(runtime)
= '2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 47587.82 msec
(14 rows)

the problem is that we're only taking a few months worth of data, so I
don't think there is much of a way of 'improve performance' on this, but
figured I'd ask quickly before I do something rash ...

Note that without the month_trunc() index, the Total runtime more then
doubles:


QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------

 Aggregate  (cost=39578.63..39660.76 rows=821 width=41) (actual
time=87805.47..101251.35 rows=144 loops=1)
   ->  Group  (cost=39578.63..39640.23 rows=8213 width=41) (actual
time=87779.56..96824.56 rows=462198 loops=1)
         ->  Sort  (cost=39578.63..39599.17 rows=8213 width=41) (actual
time=87779.52..90781.48 rows=462198 loops=1)
               Sort Key: c.company_name, ts.company_id
               ->  Merge Join  (cost=38899.14..39044.62 rows=8213 width=41)
(actual time=64073.98..72783.68 rows=462198 loops=1)
                     Merge Cond: ("outer".company_id = "inner".company_id)
                     ->  Sort  (cost=24.41..25.29 rows=352 width=25)
(actual time=64.66..66.55 rows=348 loops=1)
                           Sort Key: c.company_id
                           ->  Seq Scan on company c  (cost=0.00..9.52
rows=352 width=25) (actual time=1.76..61.70 rows=352 loops=1)
                     ->  Sort  (cost=38874.73..38895.27 rows=8213 width=16)
(actual time=64009.26..66860.71 rows=462198 loops=1)
                           Sort Key: ts.company_id
                           ->  Seq Scan on traffic_logs ts
(cost=0.00..38340.72 rows=8213 width=16) (actual time=5.02..-645982.04
rows=462198 loops=1)
                                 Filter: (date_trunc('month'::text,
runtime) = '2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 101277.17 msec
(14 rows)


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings





---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to