On Tue, 11 Nov 2003, Dennis Bjorklund wrote:

> On Mon, 10 Nov 2003, Marc G. Fournier wrote:
>
> >
> > explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
> >     FROM company c, traffic_logs ts
> >    WHERE c.company_id = ts.company_id
> >      AND month_trunc(ts.runtime) = '2003-10-01'
> > GROUP BY company_name,ts.company_id;
>
> What if you do
>
>   ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01'
>
> and add an index like (runtime, company_name, company_id)?

Good thought, but even simplifying it to the *lowest* query possible, with
no table joins, is painfully slow:

explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
    FROM traffic_logs ts
   WHERE month_trunc(ts.runtime) = '2003-10-01'
GROUP BY ts.company_id;


                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=31630.84..31693.05 rows=829 width=16) (actual 
time=14862.71..26552.39 rows=144 loops=1)
   ->  Group  (cost=31630.84..31672.31 rows=8295 width=16) (actual 
time=9634.28..20967.07 rows=462198 loops=1)
         ->  Sort  (cost=31630.84..31651.57 rows=8295 width=16) (actual 
time=9634.24..12838.73 rows=462198 loops=1)
               Sort Key: company_id
               ->  Index Scan using tl_month on traffic_logs ts  (cost=0.00..31090.93 
rows=8295 width=16) (actual time=0.26..6043.35 rows=462198 loops=1)
                     Index Cond: (month_trunc(runtime) = '2003-10-01 
00:00:00'::timestamp without time zone)
 Total runtime: 26659.35 msec
(7 rows)



-OR-

explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
    FROM traffic_logs ts
   WHERE ts.runtime >= '2003-10-01' AND ts.runtime < '2003-11-01'
GROUP BY ts.company_id;


                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=81044.53..84424.21 rows=45062 width=16) (actual 
time=13307.52..29274.66 rows=144 loops=1)
   ->  Group  (cost=81044.53..83297.65 rows=450625 width=16) (actual 
time=10809.02..-673265.13 rows=462198 loops=1)
         ->  Sort  (cost=81044.53..82171.09 rows=450625 width=16) (actual 
time=10808.99..14069.79 rows=462198 loops=1)
               Sort Key: company_id
               ->  Seq Scan on traffic_logs ts  (cost=0.00..38727.35 rows=450625 
width=16) (actual time=0.07..6801.92 rows=462198 loops=1)
                     Filter: ((runtime >= '2003-10-01 00:00:00'::timestamp without 
time zone) AND (runtime < '2003-11-01 00:00:00'::timestamp without time zone))
 Total runtime: 29385.97 msec
(7 rows)


Just as a side note, just doing a straight scan for the records, with no
SUM()/GROUP BY involved, with the month_trunc() index is still >8k msec:

                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using tl_month on traffic_logs ts  (cost=0.00..31096.36 rows=8297 
width=16) (actual time=0.96..5432.93 rows=462198 loops=1)
   Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time 
zone)
 Total runtime: 8092.88 msec
(3 rows)

and without the index, >15k msec:

                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Seq Scan on traffic_logs ts  (cost=0.00..38719.55 rows=8297 width=16) (actual 
time=0.11..11354.45 rows=462198 loops=1)
   Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp 
without time zone)
 Total runtime: 15353.57 msec
(3 rows)

so the GROUP BY is affecting the overall, but even without it, its still
taking a helluva long time ...

I'm going to modify my load script so that it dumps monthly totals to
traffic_logs, and 'details' to a schema.traffic_logs table ... I don't
need the 'per day totals' at the top level at all, only speed ... the 'per
day totals' are only required at the 'per client' level, and by moving the
'per day' into a client schema will shrink the table significantly ...

If it wasn't for trying to pull in that 'whole month' summary, it would be
fine :(

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to