[PERFORM] Help interpreting explain analyze output

2004-08-15 Thread Ole Tange
Using this SQL:

EXPLAIN ANALYZE
SELECT DISTINCT
sessionid,
'2004-33' AS yearweek,
nd.niveau
INTO TEMP
distinct_session
FROM
httplog h ,niveaudir nd
WHERE
hitDateTime('now'::timestamp with time zone-'1440 min'::interval)
and h.hostid=(select hostnameid from hostname where hostname='www.forbrug.dk')
and h.statusid!=404
and h.pathid=nd.pathid
;


I get this output:

QUERY PLAN
-
 Unique  (cost=5680.00..5766.70 rows=8670 width=26) (actual 
time=17030.802..20044.614 rows=68213 loops=1)
   InitPlan
 -  Index Scan using hostname_hostname_key on hostname  
(cost=0.00..5.42 rows=1 width=4) (actual time=0.101..0.106 rows=1 loops=1)
   Index Cond: (hostname = 'www.forbrug.dk'::text)
   -  Sort  (cost=5674.58..5696.25 rows=8670 width=26) (actual 
time=17030.792..17689.650 rows=174714 loops=1)
 Sort Key: h.sessionid, '2004-33'::text, nd.niveau
 -  Merge Join  (cost=4500.70..5107.48 rows=8670 width=26) 
(actual time=3226.955..3966.011 rows=174714 loops=1)
   Merge Cond: (outer.pathid = inner.pathid)
   -  Index Scan using niveaudir_pathid on niveaudir nd  
(cost=0.00..465.59 rows=22715 width=26) (actual time=0.181..52.248 
rows=22330 loops=1)
   -  Sort  (cost=4500.70..4522.38 rows=8670 width=8) (actual 
time=3226.666..3443.092 rows=174714 loops=1)
 Sort Key: h.pathid
 -  Index Scan using httplog_hitdatetime on httplog h  
(cost=0.00..3933.61 rows=8670 width=8) (actual time=0.425..1048.428 
rows=174714 loops=1)
   Index Cond: (hitdatetime  '2004-08-14 
16:41:16.855829+02'::timestamp with time zone)
   Filter: ((hostid = $0) AND (statusid  404))
 Total runtime: 20478.174 ms
(15 rows)

As I read it the output tells me what was done during the milliseconds:

0.101..0.106
0.181..52.248
0.425..1048.428
3226.666..3443.092
3226.955..3966.011
17030.792..17689.650
17030.802..20044.614

However, there are a few large gaps. What is happening during:

1048.428..3226.666 (2 secs)
3966.011..17030.792 (13 secs!)

This is the major part of the time but this is not accounted for in the
explain analyze output. It seems PostgreSQL is doing stuff that is not
part of the query plan. How do I get to know what this stuff is?


/Ole

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Help interpreting explain analyze output

2004-08-15 Thread Ole Tange
On Sun, 15 Aug 2004, Tom Lane wrote:

 Ole Tange [EMAIL PROTECTED] writes:
  As I read it the output tells me what was done during the milliseconds:
 
 No, you have a fundamental misconception here.  The notation means that
 the first output row from a plan step was delivered after X
 milliseconds, and the last row after Y milliseconds.

Thanks. For a novice tuner like me it would be nice if you could see more
easily where the time was spent. However, the output is _far_ more
intuitive that MySQL's.

 It is gonna take a while to sort 175000 rows ... but possibly increasing
 sort_mem would help.

It didn't. However, I could reformulate the DISTINCT query as a GROUP BY
on all the selected fields and this uses Hash aggregate which is far
faster.

Now I am curious: Why isn't DISTINCT implemented using a Hash aggregate?


/Ole

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match