Re: [PERFORM] long running query running too long

2004-02-17 Thread PC Drew
On Feb 17, 2004, at 10:06 AM, Todd Fulton wrote:


Ive got a table with about 8 million rows and growing. I must run 
reports daily off this table, and another smaller one. Typical query 
 joins, groupings and aggregates included. This certain report takes 
about 10 minutes on average and is getting longer. Ive created all 
the indices I think are necessary.


What indexes have you created?  The query is not using any indexes, so 
there might be a problem there.  Can you disable seqscans temporarily 
to test this?

prod=# explain analyze SELECT t.tgpid, t.directoryname, t.templateid, 
count(*) AS requested FROM (spk_tgp t JOIN spk_tgplog l ON ((t.tgpid = 
l.tgpid))) GROUP BY t.tgpid, t.directoryname, t.templateid;


Can you please send the results of the following commands:

psql=# \d spk_tgp

and

psql=# \d spk_tgplog

You might also want to try using a sub-query instead of a join.  I'm 
assuming that the spk_tgplog table has a lot of rows and spk_tgp has 
very few rows.  It might make sense to try something like this:

EXPLAIN ANALYZE
SELECT t.tgpid, t.directoryname, t.templateid, r.requested
FROM (SELECT tgpid, count(*) AS requested FROM spk_tgplog GROUP BY 
tgpid) r, spk_tgp t
WHERE r.tgpid = t.tgpid;

--
PC Drew
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] long running query running too long

2004-02-17 Thread Todd Fulton
Hey!  I think I have appropriate indexes, but might now.  You're
absolutely right on my join -- spk_tgplog has the 8.5 million rows,
spk_tgp around 2400.  I'll try the sub-select.  Here is the output you
asked for:

spank_prod=# \d spk_tgp;
 Table spk_tgp
 Column |   Type   |
Modifiers
+--+
-
 tgpid  | bigint   | not null
 directoryname  | character varying(64)| not null
 directoryurl   | character varying(1028)  | not null
 submiturl  | character varying(1028)  |
 submitdate | date |
 acceptdate | date |
 templateid | character varying(64)| not null
 reciprocalcode | character varying(2056)  |
 notes  | character varying(2056)  |
 createdate | timestamp with time zone | not null default
('now'::text)::timestamp(6) with time zone
 modifydate | timestamp with time zone | not null default
('now'::text)::timestamp(6) with time zone
 requested  | integer  |
 hostid | integer  | default 1
Indexes: idx_spk_tgp_tgpid
Primary key: pk_spk_tgp

spank_prod=# \d idx_spk_tgp_tgpid
   Index idx_spk_tgp_tgpid
Column | Type
---+---
 tgpid | bigint
 directoryname | character varying(64)
btree

spank_prod=# \d spk_tgplog;
   Table spk_tgplog
Column |   Type   |
Modifiers
---+--+-

 remoteaddress | character varying(32)| not null
 tgpid | bigint   | not null
 referer   | character varying(256)   |
 createdate| timestamp with time zone | not null default
('now'::text)::timestamp(6) with time zone
Indexes: idx_spk_tgplog_createdate,
 idx_spk_tgplog_tgpid

spank_prod=# \d idx_spk_tgplog_createdate
   Index idx_spk_tgplog_createdate
   Column   |   Type
+--
 createdate | timestamp with time zone
btree

spank_prod=# \d idx_spk_tgplog_tgpid
Index idx_spk_tgplog_tgpid
 Column |  Type
+
 tgpid  | bigint
btree



Todd



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of PC Drew
Sent: Tuesday, February 17, 2004 12:05 PM
To: Todd Fulton
Cc: [EMAIL PROTECTED]
Subject: Re: [PERFORM] long running query running too long


On Feb 17, 2004, at 10:06 AM, Todd Fulton wrote:


 I’ve got a table with about 8 million rows and growing.  I must run 
 reports daily off this table, and another smaller one.  Typical query 
 – joins, groupings and aggregates included.  This certain report takes

 about 10 minutes on average and is getting longer.  I’ve created all 
 the indices I think are necessary.



What indexes have you created?  The query is not using any indexes, so 
there might be a problem there.  Can you disable seqscans temporarily 
to test this?


 prod=# explain analyze SELECT t.tgpid, t.directoryname, t.templateid, 
 count(*) AS requested FROM (spk_tgp t JOIN spk_tgplog l ON ((t.tgpid =

 l.tgpid))) GROUP BY t.tgpid, t.directoryname, t.templateid;


Can you please send the results of the following commands:

psql=# \d spk_tgp

and

psql=# \d spk_tgplog


You might also want to try using a sub-query instead of a join.  I'm 
assuming that the spk_tgplog table has a lot of rows and spk_tgp has 
very few rows.  It might make sense to try something like this:

EXPLAIN ANALYZE
SELECT t.tgpid, t.directoryname, t.templateid, r.requested
FROM (SELECT tgpid, count(*) AS requested FROM spk_tgplog GROUP BY 
tgpid) r, spk_tgp t
WHERE r.tgpid = t.tgpid;

--
PC Drew


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] long running query running too long

2004-02-17 Thread Tom Lane
Todd Fulton [EMAIL PROTECTED] writes:
 prod=# explain analyze SELECT t.tgpid, t.directoryname, t.templateid,
 count(*) AS requested FROM (spk_tgp t JOIN spk_tgplog l ON ((t.tgpid =
 l.tgpid))) GROUP BY t.tgpid, t.directoryname, t.templateid;
 NOTICE:  QUERY PLAN:
 
 Aggregate  (cost=2740451.66..2820969.41 rows=805178 width=48) (actual
 time=460577.85..528968.17 rows=1875 loops=1)
   -  Group  (cost=2740451.66..2800839.97 rows=8051775 width=48) (actual
 time=460577.57..516992.19 rows=8117748 loops=1)
 -  Sort  (cost=2740451.66..2740451.66 rows=8051775 width=48)
 (actual time=460577.55..474657.59 rows=8117748 loops=1)
   -  Hash Join  (cost=128.26..409517.83 rows=8051775
 width=48) (actual time=11.45..85332.88 rows=8117748 loops=1)
 -  Seq Scan on spk_tgplog l  (cost=0.00..187965.75
 rows=8051775 width=8) (actual time=0.03..28926.67 rows=8125690 loops=1)
 -  Hash  (cost=123.41..123.41 rows=1941 width=40)
 (actual time=11.28..11.28 rows=0 loops=1)
   -  Seq Scan on spk_tgp t  (cost=0.00..123.41
 rows=1941 width=40) (actual time=0.06..7.60 rows=1880 loops=1)
 Total runtime: 529542.66 msec

The join itself is being done fine --- I doubt there is another option
that will go faster, given the difference in the table sizes.  Note the
join step completes in only 85 seconds.  What is killing you is the
sorting/grouping operation.  You could try increasing sort_mem to see
if that makes it go any faster, but I suspect the best answer would be to
update to PG 7.4.  7.4 will probably use hash aggregation for this and
avoid the sort altogether.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])