Re: [PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Richard Huxton
Joost Kraaijeveld wrote:
Hi all,
A question on how to read and interpret the explain analyse statement
(and what to do)
I have a query SELECT A.ordernummer, B.klantnummer FROM orders A
LEFT OUTER JOIN klt_alg B ON A.Klantnummer=B.Klantnummer ORDER BY
A.klantnummer;
Both tables have an btree index on klantnummer (int4, the column the
join is on). I have vacuumed and analyzed both tables. The explain
analyse is:
Indexes not necessarily useful here since you're fetching all rows in A 
and presumably much of B

Sort
  Hash Left Join
Seq Scan on orders a
Hash
  Seq Scan on klt_alg b
I've trimmed the above from your explain output. It's sequentially 
scanning b and using a hash to join to a before sorting the results.

Questions: -  Hash Left Join  (cost=41557.43..110069.51 rows=1100836
width=12) (actual time=21263.858..42845.158 rows=1104380 loops=1)
0. What exactly are the numbers in cost=41557.43..110069.51 ( I
assume for the other questions that 41557.43 is the estimated MS the
query will take, what are the others)?
The cost numbers represent effort rather than time. They're only 
really useful in that you can compare one part of the query to another. 
There are two numbers because the first shows startup, the second final 
time. So - the outer parts of the query will have increasing startup 
values since the inner parts will have to do their work first.

The actual time is measured in ms, but remember to multiply it by the 
loops value. Oh, and actually measuring the time slows the query down too.

1. I assume that (cost=41557.43..110069.51 rows=1100836 width=12) is
the estimated cost and (actual time=21263.858..42845.158 rows=1104380
loops=1) the actual cost. Is the difference acceptable?
2. If not, what can I do about it?
The key thing to look for here is the number of rows. If PG expects say 
100 rows but there are instead 10,000 then it may choose the wrong plan. 
In this case the estimate is 1,100,836 and the actual is 1,104,380 - 
very close.

--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Tom Lane
Richard Huxton dev@archonet.com writes:
 Joost Kraaijeveld wrote:
 2. If not, what can I do about it?

 The key thing to look for here is the number of rows. If PG expects say 
 100 rows but there are instead 10,000 then it may choose the wrong plan. 
 In this case the estimate is 1,100,836 and the actual is 1,104,380 - 
 very close.

On the surface this looks like a reasonable plan choice.  If you like
you can try the other two basic types of join plan by turning off
enable_hashjoin, which will likely drive the planner to use a merge
join, and then also turn off enable_mergejoin to get a nested loop
(or if it thinks nested loop is second best, turn off enable_nestloop
to see the behavior with a merge join).

What's important in comparing different plan alternatives is the ratios
of estimated costs to actual elapsed times.  If the planner is doing its
job well, those ratios should be similar across all the alternatives
(which implies of course that the cheapest-estimate plan is also the
cheapest in reality).  If not, it may be appropriate to fool with the
planner's cost estimate parameters to try to line up estimates and
reality a bit better.

See
http://www.postgresql.org/docs/8.0/static/performance-tips.html
for more detail.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Joost Kraaijeveld
Hi Tom,

Tom Lane schreef:
 On the surface this looks like a reasonable plan choice.  If you like
 you can try the other two basic types of join plan by turning off
 enable_hashjoin, which will likely drive the planner to use a merge
 join, and then also turn off enable_mergejoin to get a nested loop
 (or if it thinks nested loop is second best, turn off enable_nestloop
 to see the behavior with a merge join).

The problem is that the query logically requests all records  ( as in select * 
from a join) from the database but actually displays (in practise) in 97% of 
the time the first 1000 records and at most the first 50.000 records 
99.99% of the time by scrolling (using page down) in the gui and 
an occasional jump to record  through something called a locator) (both 
percentages tested!).

If I do the same query with a limit 60.000 or if I do a set enable_seqscan = 
off the query returns in 0.3 secs. Otherwise it lasts for 20 secs (which is 
too much for the user to wait for, given the circumstances).

I cannot change the query (it is geneated by a tool called Clarion) but it 
something like (from the psqlodbc_xxx.log):
...
declare SQL_CUR01 cursor for 
SELECT A.ordernummer, B.klantnummer FROM orders A LEFT OUTER JOIN klt_alg B 
ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer;
fetch 100 in SQL_CUR01;
...

PostgreSQL does the planning (and than executes accordingly) to the query and 
not the fetch 100. Changing the query  with a limit whatever prohibits 
scrolling after the size of the resultset. If Postgres should delay the 
planning of the actual query untill the fetch it could choose the quick 
solution. Another solution would be to advise PostgreSQL which index etc 
(whatever etc means ;-))  to use ( as in the mailing from Silke Trissl in the 
performance list on 09-02-05).

 What's important in comparing different plan alternatives is the ratios
 of estimated costs to actual elapsed times.  If the planner is doing its
 job well, those ratios should be similar across all the alternatives
 (which implies of course that the cheapest-estimate plan is also the
 cheapest in reality).  If not, it may be appropriate to fool with the
 planner's cost estimate parameters to try to line up estimates and
 reality a bit better. 
I I really do a select * and display the result, the planner is right (tested 
with set enable_seqscan = off and set enable_seqscan = on).

Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How to interpret this explain analyse?

2005-02-11 Thread Tom Lane
Joost Kraaijeveld [EMAIL PROTECTED] writes:
 I cannot change the query (it is geneated by a tool called Clarion) but it 
 something like (from the psqlodbc_xxx.log):
 ...
 declare SQL_CUR01 cursor for 
 SELECT A.ordernummer, B.klantnummer FROM orders A LEFT OUTER JOIN klt_alg 
 B ON A.Klantnummer=B.Klantnummer ORDER BY A.klantnummer;
 fetch 100 in SQL_CUR01;
 ...

Well, the planner does put some emphasis on startup time when dealing
with a DECLARE CURSOR plan; the problem you face is just that that
correction isn't large enough.  (From memory, I think it optimizes on
the assumption that 10% of the estimated rows will actually be fetched;
you evidently want a setting of 1% or even less.)

We once talked about setting up a GUC variable to control the percentage
of a cursor that is estimated to be fetched:
http://archives.postgresql.org/pgsql-hackers/2000-10/msg01108.php
It never got done but that seems like the most reasonable solution to
me.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq