Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Greg Stark

Dennis Bjorklund [EMAIL PROTECTED] writes:

 On Mon, 10 Nov 2003, Marc G. Fournier wrote:
 
  
  explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
  FROM company c, traffic_logs ts
 WHERE c.company_id = ts.company_id
   AND month_trunc(ts.runtime) = '2003-10-01'
  GROUP BY company_name,ts.company_id;

So depending on how much work you're willing to do there are some more
dramatic speedups you could get:

Use partial indexes like this (you'll need one for every month):

create index i on traffic_log (company_id) 
 where month_trunc(runtime) = '2003-10-01'

then group by company_id only so it can use the index:

select * 
  from company
  join (
select company_id, sum(bytes) as total_traffic
  from traffic_log
 where month_trunc(runtime) = '2003-10-01'
 group by company_id
   ) as x using (company_id)
  order by company_name



Actually you might be able to get the same effect using function indexes like:

create index i on traffic_log (month_trunc(runtime), company_id)


-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Marc G. Fournier

On Tue, 11 Nov 2003, Greg Stark wrote:

 Actually you might be able to get the same effect using function indexes
 like:

 create index i on traffic_log (month_trunc(runtime), company_id)

had actually thought of that one ... is it something that is only
available in v7.4?

ams=# create index i on traffic_logs ( month_trunc(runtime), company_id );
ERROR:  parser: parse error at or near , at character 54


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Marc G. Fournier


On Tue, 11 Nov 2003, Dennis Bjorklund wrote:

 On Mon, 10 Nov 2003, Marc G. Fournier wrote:

 
  explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
  FROM company c, traffic_logs ts
 WHERE c.company_id = ts.company_id
   AND month_trunc(ts.runtime) = '2003-10-01'
  GROUP BY company_name,ts.company_id;

 What if you do

   ts.runtime = '2003-10-01' AND ts.runtime  '2003-11-01'

 and add an index like (runtime, company_name, company_id)?

Good thought, but even simplifying it to the *lowest* query possible, with
no table joins, is painfully slow:

explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
FROM traffic_logs ts
   WHERE month_trunc(ts.runtime) = '2003-10-01'
GROUP BY ts.company_id;


QUERY PLAN
--
 Aggregate  (cost=31630.84..31693.05 rows=829 width=16) (actual 
time=14862.71..26552.39 rows=144 loops=1)
   -  Group  (cost=31630.84..31672.31 rows=8295 width=16) (actual 
time=9634.28..20967.07 rows=462198 loops=1)
 -  Sort  (cost=31630.84..31651.57 rows=8295 width=16) (actual 
time=9634.24..12838.73 rows=462198 loops=1)
   Sort Key: company_id
   -  Index Scan using tl_month on traffic_logs ts  (cost=0.00..31090.93 
rows=8295 width=16) (actual time=0.26..6043.35 rows=462198 loops=1)
 Index Cond: (month_trunc(runtime) = '2003-10-01 
00:00:00'::timestamp without time zone)
 Total runtime: 26659.35 msec
(7 rows)



-OR-

explain analyze SELECT ts.company_id, SUM(ts.bytes) AS total_traffic
FROM traffic_logs ts
   WHERE ts.runtime = '2003-10-01' AND ts.runtime  '2003-11-01'
GROUP BY ts.company_id;


QUERY PLAN
--
 Aggregate  (cost=81044.53..84424.21 rows=45062 width=16) (actual 
time=13307.52..29274.66 rows=144 loops=1)
   -  Group  (cost=81044.53..83297.65 rows=450625 width=16) (actual 
time=10809.02..-673265.13 rows=462198 loops=1)
 -  Sort  (cost=81044.53..82171.09 rows=450625 width=16) (actual 
time=10808.99..14069.79 rows=462198 loops=1)
   Sort Key: company_id
   -  Seq Scan on traffic_logs ts  (cost=0.00..38727.35 rows=450625 
width=16) (actual time=0.07..6801.92 rows=462198 loops=1)
 Filter: ((runtime = '2003-10-01 00:00:00'::timestamp without 
time zone) AND (runtime  '2003-11-01 00:00:00'::timestamp without time zone))
 Total runtime: 29385.97 msec
(7 rows)


Just as a side note, just doing a straight scan for the records, with no
SUM()/GROUP BY involved, with the month_trunc() index is still 8k msec:

   QUERY PLAN

 Index Scan using tl_month on traffic_logs ts  (cost=0.00..31096.36 rows=8297 
width=16) (actual time=0.96..5432.93 rows=462198 loops=1)
   Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without time 
zone)
 Total runtime: 8092.88 msec
(3 rows)

and without the index, 15k msec:

   QUERY PLAN

 Seq Scan on traffic_logs ts  (cost=0.00..38719.55 rows=8297 width=16) (actual 
time=0.11..11354.45 rows=462198 loops=1)
   Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp 
without time zone)
 Total runtime: 15353.57 msec
(3 rows)

so the GROUP BY is affecting the overall, but even without it, its still
taking a helluva long time ...

I'm going to modify my load script so that it dumps monthly totals to
traffic_logs, and 'details' to a schema.traffic_logs table ... I don't
need the 'per day totals' at the top level at all, only speed ... the 'per
day totals' are only required at the 'per client' level, and by moving the
'per day' into a client schema will shrink the table significantly ...

If it wasn't for trying to pull in that 'whole month' summary, it would be
fine :(

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Greg Stark
Marc G. Fournier [EMAIL PROTECTED] writes:

 On Tue, 11 Nov 2003, Greg Stark wrote:
 
  Actually you might be able to get the same effect using function indexes
  like:
 
  create index i on traffic_log (month_trunc(runtime), company_id)
 
 had actually thought of that one ... is it something that is only
 available in v7.4?

Hum, I thought you could do simple functional indexes like that in 7.3, but
perhaps only single-column indexes.

In any case, given your situation I would seriously consider putting a
month integer column on your table anyways. Then your index would be a
simple (month, company_id) index.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread scott.marlowe
On 11 Nov 2003, Greg Stark wrote:

 Marc G. Fournier [EMAIL PROTECTED] writes:
 
  On Tue, 11 Nov 2003, Greg Stark wrote:
  
   Actually you might be able to get the same effect using function indexes
   like:
  
   create index i on traffic_log (month_trunc(runtime), company_id)
  
  had actually thought of that one ... is it something that is only
  available in v7.4?
 
 Hum, I thought you could do simple functional indexes like that in 7.3, but
 perhaps only single-column indexes.
 
 In any case, given your situation I would seriously consider putting a
 month integer column on your table anyways. Then your index would be a
 simple (month, company_id) index.

In 7.3 and before, you had to use only column names as inputs, so you 
could cheat:

alter table test add alp int;
alter table test add omg int;
update test set alp=0;
update test set omg=13;

and then create a functional index:

create index test_xy on test (substr(info,alp,omg));

select * from test where substr(info,alp,omg)=='abcd';





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-11 Thread Greg Stark

Marc G. Fournier [EMAIL PROTECTED] writes:

 Just as a side note, just doing a straight scan for the records, with no
 SUM()/GROUP BY involved, with the month_trunc() index is still 8k msec:

Well so the problem isn't the query at all, you just have too much data to
massage online. You can preprocess the data offline into a more managable
amount of data for your online reports.

What I used to do for a similar situation was to do hourly queries sort of
like this:

insert into data_aggregate (day, hour, company_id, total_bytes)
 (select trunc(now(),'day'), trunc(now(), 'hour'), company_id, sum(bytes)
from raw_data
   where time between trunc(now(),'hour') and trunc(now(),'hour')+'1 hour'::interval
   group by company_id
 )

[this was actually on oracle and the data looked kind of different, i'm making
this up as i go along]

Then later the reports could run quickly based on data_aggregate instead of
slowly based on the much larger data set accumulated by the minute. Once I had
this schema set up it was easy to follow it for all of the rapidly growing
data tables.

Now in my situation I had thousands of records accumulating per second, so
hourly was already a big win. I originally chose hourly because I thought I
might want time-of-day reports but that never panned out. On the other hand it
was a win when the system broke once because I could easily see that and fix
it before midnight when it would have actually mattered. Perhaps in your
situation you would want daily aggregates or something else.

One of the other advantages of these aggregate tables was that we could purge
the old data much sooner with much less resistance from the business. Since
the reports were all still available and a lot of ad-hoc queries could still
be done without the raw data anyways.

Alternatively you can just give up on online reports. Eventually you'll have
some query that takes way more than 8s anyways. You can pregenerate the entire
report as a batch job instead. Either send it off as a nightly e-mail, store
it as an html or csv file for the web server, or (my favourite) store the data
for the report as an sql table and then have multiple front-ends that do a
simple select * to pull the data and format it.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Patrick Hatcher

here's the URL:
http://techdocs.postgresql.org/techdocs/pgsqladventuresep2.php

Patrick Hatcher
Macys.Com
Legacy Integration Developer
415-422-1610 office
HatcherPT - AIM


   
   Patrick 
   Hatcher/MCOM/FDD
To 
   11/10/2003 12:31 PMMarc G. Fournier   
  [EMAIL PROTECTED]@FDS-NOTES   
cc 
  [EMAIL PROTECTED],
  [EMAIL PROTECTED] 
  rg   
   Subject 
  Re: [PERFORM] *very* slow query to   
  summarize data for a month ...   
  (Document link: Patrick Hatcher) 
   
   
   
   
   
   



Do you have an index on ts.bytes?  Josh had suggested this and after I put
it on my summed fields, I saw a speed increase.  I can't remember the
article was that Josh had written about index usage, but maybe he'll chime
in and supply the URL for his article.
hth

Patrick Hatcher



   
   Marc G. Fournier  
   [EMAIL PROTECTED] 
   .orgTo 
   Sent by:   [EMAIL PROTECTED] 
   pgsql-performance-o  cc 
   [EMAIL PROTECTED] 
   Subject 
  [PERFORM] *very* slow query to   
   11/10/2003 12:18 PMsummarize data for a month ...   
   
   
   
   
   
   





Table structure is simple:

CREATE TABLE traffic_logs (
company_id bigint,
ip_id bigint,
port integer,
bytes bigint,
runtime timestamp without time zone
);

runtime is 'day of month' ...

I need to summarize the month, per company, with a query as:

explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS
total_traffic
FROM company c, traffic_logs ts
   WHERE c.company_id = ts.company_id
 AND month_trunc(ts.runtime) = '2003-10-01'
GROUP BY company_name,ts.company_id;

and the explain looks like:
QUERY PLAN
--

 Aggregate  (cost=32000.94..32083.07 rows=821 width=41) (actual
time=32983.36..47586.17 rows=144 loops=1)
   -  Group  (cost=32000.94..32062.54 rows=8213 width=41) (actual
time=32957.40..42817.88 rows=462198 loops=1)
 -  Sort  (cost=32000.94..32021.47 rows=8213 width=41) (actual
time=32957.38..36261.31 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=31321.45..31466.92 rows=8213 width=41)
(actual time=13983.07..22642.14 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25)
(actual time=5.52..7.40 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52
rows=352 width=25) (actual time=0.02..2.78 rows=352 loops=1)
 -  Sort  (cost=31297.04..31317.57 rows=8213 width=16)
(actual time=13977.49..16794.41 rows=462198 loops=1)
   Sort Key: ts.company_id
   -  Index Scan using tl_month on traffic_logs ts
(cost=0.00

Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Patrick Hatcher

Do you have an index on ts.bytes?  Josh had suggested this and after I put
it on my summed fields, I saw a speed increase.  I can't remember the
article was that Josh had written about index usage, but maybe he'll chime
in and supply the URL for his article.
hth

Patrick Hatcher



   
   Marc G. Fournier  
   [EMAIL PROTECTED] 
   .orgTo 
   Sent by:   [EMAIL PROTECTED] 
   pgsql-performance-o  cc 
   [EMAIL PROTECTED] 
   Subject 
  [PERFORM] *very* slow query to   
   11/10/2003 12:18 PMsummarize data for a month ...   
   
   
   
   
   
   





Table structure is simple:

CREATE TABLE traffic_logs (
company_id bigint,
ip_id bigint,
port integer,
bytes bigint,
runtime timestamp without time zone
);

runtime is 'day of month' ...

I need to summarize the month, per company, with a query as:

explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS
total_traffic
FROM company c, traffic_logs ts
   WHERE c.company_id = ts.company_id
 AND month_trunc(ts.runtime) = '2003-10-01'
GROUP BY company_name,ts.company_id;

and the explain looks like:
QUERY PLAN
--

 Aggregate  (cost=32000.94..32083.07 rows=821 width=41) (actual
time=32983.36..47586.17 rows=144 loops=1)
   -  Group  (cost=32000.94..32062.54 rows=8213 width=41) (actual
time=32957.40..42817.88 rows=462198 loops=1)
 -  Sort  (cost=32000.94..32021.47 rows=8213 width=41) (actual
time=32957.38..36261.31 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=31321.45..31466.92 rows=8213 width=41)
(actual time=13983.07..22642.14 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25)
(actual time=5.52..7.40 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52
rows=352 width=25) (actual time=0.02..2.78 rows=352 loops=1)
 -  Sort  (cost=31297.04..31317.57 rows=8213 width=16)
(actual time=13977.49..16794.41 rows=462198 loops=1)
   Sort Key: ts.company_id
   -  Index Scan using tl_month on traffic_logs ts
(cost=0.00..30763.02 rows=8213 width=16) (actual time=0.29..5562.25
rows=462198 loops=1)
 Index Cond: (month_trunc(runtime)
= '2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 47587.82 msec
(14 rows)

the problem is that we're only taking a few months worth of data, so I
don't think there is much of a way of 'improve performance' on this, but
figured I'd ask quickly before I do something rash ...

Note that without the month_trunc() index, the Total runtime more then
doubles:


QUERY PLAN


 Aggregate  (cost=39578.63..39660.76 rows=821 width=41) (actual
time=87805.47..101251.35 rows=144 loops=1)
   -  Group  (cost=39578.63..39640.23 rows=8213 width=41) (actual
time=87779.56..96824.56 rows=462198 loops=1)
 -  Sort  (cost=39578.63..39599.17 rows=8213 width=41) (actual
time=87779.52..90781.48 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=38899.14..39044.62 rows=8213 width=41)
(actual time=64073.98..72783.68 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25)
(actual time=64.66..66.55 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52
rows=352 width=25) (actual time=1.76..61.70 rows=352 loops=1)
 -  Sort  (cost=38874.73..38895.27 rows=8213 width=16)
(actual 

Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Neil Conway
Patrick Hatcher [EMAIL PROTECTED] writes:
 Do you have an index on ts.bytes?  Josh had suggested this and after I put
 it on my summed fields, I saw a speed increase.

What's the reasoning behind this? ISTM that sum() should never use an
index, nor would it benefit from using one.

-Neil


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Neil Conway
Marc G. Fournier [EMAIL PROTECTED] writes:
 -  Index Scan using tl_month on traffic_logs ts  (cost=0.00..30763.02 rows=8213 
 width=16) (actual time=0.29..5562.25 rows=462198 loops=1)
   Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without 
 time zone)

Interesting that we get the row count estimate for this index scan so
wrong -- I believe this is the root of the problem. Hmmm... I would
guess that the optimizer stats we have for estimating the selectivity
of a functional index is pretty primitive, but I haven't looked into
it at all. Tom might be able to shed some light...

[ In the second EXPLAIN ANALYZE, ... ]

 -  Seq Scan on traffic_logs ts  (cost=0.00..38340.72 rows=8213 width=16) (actual 
 time=5.02..-645982.04 rows=462198 loops=1)
   Filter: (date_trunc('month'::text, runtime) = '2003-10-01 00:00:00'::timestamp 
 without time zone)

Uh, what? The actual time seems to have finished far before it has
begun :-) Is this just a typo, or does the actual output include a
negative number?

-Neil


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 Interesting that we get the row count estimate for this index scan so
 wrong -- I believe this is the root of the problem. Hmmm... I would
 guess that the optimizer stats we have for estimating the selectivity
 of a functional index is pretty primitive, but I haven't looked into
 it at all. Tom might be able to shed some light...

Try none at all.  I have speculated in the past that it would be worth
gathering statistics about the contents of functional indexes, but it's
still on the to-do-someday list.

 -  Seq Scan on traffic_logs ts  (cost=0.00..38340.72 rows=8213 width=16) (actual 
 time=5.02..-645982.04 rows=462198 loops=1)

 Uh, what?

That is bizarre, all right.  Is it reproducible?

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Marc G. Fournier


On Mon, 10 Nov 2003, Neil Conway wrote:

 Marc G. Fournier [EMAIL PROTECTED] writes:
  -  Index Scan using tl_month on traffic_logs ts  (cost=0.00..30763.02 rows=8213 
  width=16) (actual time=0.29..5562.25 rows=462198 loops=1)
Index Cond: (month_trunc(runtime) = '2003-10-01 00:00:00'::timestamp without 
  time zone)

 Interesting that we get the row count estimate for this index scan so
 wrong -- I believe this is the root of the problem. Hmmm... I would
 guess that the optimizer stats we have for estimating the selectivity
 of a functional index is pretty primitive, but I haven't looked into
 it at all. Tom might be able to shed some light...

 [ In the second EXPLAIN ANALYZE, ... ]

  -  Seq Scan on traffic_logs ts  (cost=0.00..38340.72 rows=8213 width=16) (actual 
  time=5.02..-645982.04 rows=462198 loops=1)
Filter: (date_trunc('month'::text, runtime) = '2003-10-01 
  00:00:00'::timestamp without time zone)

 Uh, what? The actual time seems to have finished far before it has
 begun :-) Is this just a typo, or does the actual output include a
 negative number?

This was purely a cut-n-paste ...


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Marc G. Fournier


On Mon, 10 Nov 2003, Tom Lane wrote:

 Neil Conway [EMAIL PROTECTED] writes:
  Interesting that we get the row count estimate for this index scan so
  wrong -- I believe this is the root of the problem. Hmmm... I would
  guess that the optimizer stats we have for estimating the selectivity
  of a functional index is pretty primitive, but I haven't looked into
  it at all. Tom might be able to shed some light...

 Try none at all.  I have speculated in the past that it would be worth
 gathering statistics about the contents of functional indexes, but it's
 still on the to-do-someday list.

  -  Seq Scan on traffic_logs ts  (cost=0.00..38340.72 rows=8213 width=16) (actual 
  time=5.02..-645982.04 rows=462198 loops=1)

  Uh, what?

 That is bizarre, all right.  Is it reproducible?

Nope, and a subsequent run shows better results too:

 QUERY PLAN
-
 Aggregate  (cost=39674.38..39756.70 rows=823 width=41) (actual 
time=35573.27..49953.47 rows=144 loops=1)
   -  Group  (cost=39674.38..39736.12 rows=8232 width=41) (actual 
time=35547.27..45479.27 rows=462198 loops=1)
 -  Sort  (cost=39674.38..39694.96 rows=8232 width=41) (actual 
time=35547.23..39167.90 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=38993.22..39139.02 rows=8232 width=41) (actual 
time=16658.23..25559.08 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25) (actual 
time=5.51..7.38 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52 rows=352 
width=25) (actual time=0.02..2.80 rows=352 loops=1)
 -  Sort  (cost=38968.82..38989.40 rows=8232 width=16) (actual 
time=16652.66..19785.83 rows=462198 loops=1)
   Sort Key: ts.company_id
   -  Seq Scan on traffic_logs ts  (cost=0.00..38433.46 
rows=8232 width=16) (actual time=0.11..8794.43 rows=462198 loops=1)
 Filter: (date_trunc('month'::text, runtime) = 
'2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 49955.22 msec


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Marc G. Fournier


On Mon, 10 Nov 2003, Josh Berkus wrote:

 Marc,

 I'd say your machine is very low on available RAM, particularly sort_mem.
 The steps which are taking a long time are:

Here's the server:

last pid: 42651;  load averages:  1.52,  0.96,  0.88
up 28+07:43:33  20:35:44
307 processes: 2 running, 304 sleeping, 1 zombie
CPU states: 18.0% user,  0.0% nice, 29.1% system,  0.6% interrupt, 52.3% idle
Mem: 1203M Active, 1839M Inact, 709M Wired, 206M Cache, 199M Buf, 5608K Free
Swap: 8192M Total, 1804K Used, 8190M Free


   Aggregate  (cost=32000.94..32083.07 rows=821 width=41) (actual
 time=32983.36..47586.17 rows=144 loops=1)
 -  Group  (cost=32000.94..32062.54 rows=8213 width=41) (actual
 time=32957.40..42817.88 rows=462198 loops=1)

 and:

 -  Merge Join  (cost=31321.45..31466.92 rows=8213 width=41)
 (actual time=13983.07..22642.14 rows=462198 loops=1)
   Merge Cond: (outer.company_id = inner.company_id)
   -  Sort  (cost=24.41..25.29 rows=352 width=25) (actual
 time=5.52..7.40 rows=348 loops=1)

 There are also *large* delays between steps.Either your I/O is saturated,
 or you haven't run a VACUUM FULL ANALYZE in a while (which would also explain
 the estimates being off).

thought about that before I started the thread, and ran it just in case ...

just restarted the server with sort_mem set to 10M, and didn't help much on the 
Aggregate, or MergeJoin ... :

 QUERY PLAN
-
 Aggregate  (cost=39674.38..39756.70 rows=823 width=41) (actual 
time=33066.25..54021.50 rows=144 loops=1)
   -  Group  (cost=39674.38..39736.12 rows=8232 width=41) (actual 
time=33040.25..47005.57 rows=462198 loops=1)
 -  Sort  (cost=39674.38..39694.96 rows=8232 width=41) (actual 
time=33040.22..37875.97 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=38993.22..39139.02 rows=8232 width=41) (actual 
time=14428.17..23568.80 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25) (actual 
time=5.80..7.66 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52 rows=352 
width=25) (actual time=0.08..3.06 rows=352 loops=1)
 -  Sort  (cost=38968.82..38989.40 rows=8232 width=16) (actual 
time=14422.27..17429.34 rows=462198 loops=1)
   Sort Key: ts.company_id
   -  Seq Scan on traffic_logs ts  (cost=0.00..38433.46 
rows=8232 width=16) (actual time=0.15..8119.72 rows=462198 loops=1)
 Filter: (date_trunc('month'::text, runtime) = 
'2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 54034.44 msec
(14 rows)

the problem is that the results we are comparing with right now is the one
that had the - time on it :(  Just restarted the server with default
sort_mem, and here is the query with that:

 QUERY PLAN
-
 Aggregate  (cost=39691.27..39773.61 rows=823 width=41) (actual 
time=35077.18..50424.74 rows=144 loops=1)
   -  Group  (cost=39691.27..39753.03 rows=8234 width=41) (actual 
time=35051.29..-650049.84 rows=462198 loops=1)
 -  Sort  (cost=39691.27..39711.86 rows=8234 width=41) (actual 
time=35051.26..38847.40 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=39009.92..39155.76 rows=8234 width=41) (actual 
time=16155.37..25439.42 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25) (actual 
time=5.85..7.71 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52 rows=352 
width=25) (actual time=0.10..3.07 rows=352 loops=1)
 -  Sort  (cost=38985.51..39006.10 rows=8234 width=16) (actual 
time=16149.46..19437.47 rows=462198 loops=1)
   Sort Key: ts.company_id
   -  Seq Scan on traffic_logs ts  (cost=0.00..38450.00 
rows=8234 width=16) (actual time=0.16..8869.37 rows=462198 loops=1)
 Filter: (date_trunc('month'::text, runtime) = 
'2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 50426.80 msec
(14 rows)


And, just on a whim, here it is set to 100M:

 QUERY PLAN

Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Dennis Bjorklund
On Mon, 10 Nov 2003, Marc G. Fournier wrote:

 
 explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS total_traffic
 FROM company c, traffic_logs ts
WHERE c.company_id = ts.company_id
  AND month_trunc(ts.runtime) = '2003-10-01'
 GROUP BY company_name,ts.company_id;

What if you do

  ts.runtime = '2003-10-01' AND ts.runtime  '2003-11-01'

and add an index like (runtime, company_name, company_id)?


-- 
/Dennis


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]