[PERFORM] Forcing using index instead of sequential scan?

2006-07-21 Thread robin.c.smith
Title: Forcing using index instead of sequential scan?






I have been testing the performance of PostgreSQL using the simple tool found at http://benchw.sourceforge.net however I have found that all the queries it run execute with sequential scans. The website where the code runs has examples of the execution plan using indexes.

When I disable the sequential plan query 0 and query 1 run faster ( http://benchw.sourceforge.net/benchw_results_postgres_history.html ) by using the indexes as suggested by the website.

I have tried increasing the effective_cache_size and reducing the random_page_cost to try and force the optimiser to use the index but it always uses the sequential scan.

What is the best way to force the use of indexes in these queries? Currently testing with version 8.1.4.


Regards


Robin Smith


British Telecommunications plc Registered office: 81 Newgate Street London EC1A 7AJ


Registered in England no. 180


This electronic message contains information from British Telecommunications plc which may be privileged and confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by telephone or e-mail (to the number or address above) immediately.





Re: [PERFORM] Forcing using index instead of sequential scan?

2006-07-21 Thread robin.c.smith
More information from the query:-

explain analyze
SELECT
d0.dmth,
count(f.fval )
FROM
dim0 AS d0,
fact0 AS f
WHERE   d0.d0key = f.d0key
AND d0.ddate BETWEEN '2010-01-01' AND '2010-12-28'
GROUP BY
d0.dmth
;

 QUERY PLAN


-
 HashAggregate  (cost=336998.83..336998.84 rows=1 width=8) (actual
time=33823.124..33823.134 rows=12 loops=1)
   -  Hash Join  (cost=214.83..335343.83 rows=331000 width=8) (actual
time=61.065..33605.343 rows=336000 loops=1)
 Hash Cond: (outer.d0key = inner.d0key)
 -  Seq Scan on fact0 f  (cost=0.00..281819.00 rows=1000
width=8) (actual time=12.766..28945.036 rows=1000 loops=1)
 -  Hash  (cost=214.00..214.00 rows=331 width=8) (actual
time=31.120..31.120 rows=336 loops=1)
   -  Seq Scan on dim0 d0  (cost=0.00..214.00 rows=331
width=8) (actual time=26.362..30.895 rows=336 loops=1)
 Filter: ((ddate = '2010-01-01'::date) AND (ddate
= '2010-12-28'::date))
 Total runtime: 33823.220 ms
(8 rows)


benchw=# \d fact0
Table public.fact0
 Column |  Type  | Modifiers
++---
 d0key  | integer| not null
 d1key  | integer| not null
 d2key  | integer| not null
 fval   | integer| not null
 ffill  | character varying(100) | not null
Indexes:
fact0_d0key btree (d0key)
fact0_d1key btree (d1key)
fact0_d2key btree (d2key)

benchw=# \d dim0
 Table public.dim0
 Column |  Type   | Modifiers
+-+---
 d0key  | integer | not null
 ddate  | date| not null
 dyr| integer | not null
 dmth   | integer | not null
 dday   | integer | not null
Indexes:
dim0_d0key UNIQUE, btree (d0key)

The example on the web site has the following execution plan:-

QUERY PLAN



 HashAggregate  (cost=286953.94..286953.94 rows=1 width=8)
   -  Nested Loop  (cost=0.00..285268.93 rows=337002 width=8)
 -  Seq Scan on dim0 d0  (cost=0.00..219.00 rows=337 width=8)
   Filter: ((ddate = '2010-01-01'::date) AND (ddate =
'2010-12-28'::date))
 -  Index Scan using fact0_d0key on fact0 f  (cost=0.00..833.07
rows=1022 width=8)
   Index Cond: (outer.d0key = f.d0key)

It uses the index on the join condition.

When I disable the sequential scan with:-

SET enable_seqscan TO off;

The execution plan looks like:-

   QUERY
PLAN


 HashAggregate  (cost=648831.52..648831.53 rows=1 width=8) (actual
time=19155.060..19155.071 rows=12 loops=1)
   -  Nested Loop  (cost=7.51..647176.52 rows=331000 width=8) (actual
time=97.878..18943.155 rows=336000 loops=1)
 -  Index Scan using dim0_d0key on dim0 d0  (cost=0.00..248.00
rows=331 width=8) (actual time=40.467..55.780 rows=336 loops=1)
   Filter: ((ddate = '2010-01-01'::date) AND (ddate =
'2010-12-28'::date))
 -  Bitmap Heap Scan on fact0 f  (cost=7.51..1941.94 rows=1002
width=8) (actual time=0.991..55.391 rows=1000 loops=336)
   Recheck Cond: (outer.d0key = f.d0key)
   -  Bitmap Index Scan on fact0_d0key  (cost=0.00..7.51
rows=1002 width=0) (actual time=0.583..0.583 rows=1000 loops=336)
 Index Cond: (outer.d0key = f.d0key)
 Total runtime: 19155.176 ms
(9 rows)

The query is 19 seconds long now; down from 34 seconds although the
execution plan doesn't match the example from the website.

Regards

Robin
-Original Message-
From: Peter Eisentraut [mailto:[EMAIL PROTECTED] 
Sent: 21 July 2006 12:46
To: pgsql-performance@postgresql.org
Cc: Smith,R,Robin,XJE4JA C
Subject: Re: [PERFORM] Forcing using index instead of sequential scan?


[EMAIL PROTECTED] wrote:
 What is the best way to force the use of indexes in these queries?

Well, the brute-force method is to use SET enable_seqscan TO off, but if

you want to get to the bottom of this, you should look at or post the 
EXPLAIN ANALYZE output of the offending queries.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Forcing using index instead of sequential scan?

2006-07-21 Thread robin.c.smith
The tables have all been analysed.

I set the work_mem to 50 and it still doesn't use the index :-(

Regards

Robin

-Original Message-
From: Richard Huxton [mailto:[EMAIL PROTECTED] 
Sent: 21 July 2006 12:54
To: Smith,R,Robin,XJE4JA C
Subject: Re: [PERFORM] Forcing using index instead of sequential scan?


[EMAIL PROTECTED] wrote:
 I have been testing the performance of PostgreSQL using the simple 
 tool found at http://benchw.sourceforge.net however I have found that 
 all the queries it run execute with sequential scans. The website 
 where the code runs has examples of the execution plan using indexes.
 
 When I disable the sequential plan query 0 and query 1 run faster ( 
 http://benchw.sourceforge.net/benchw_results_postgres_history.html ) 
 by using the indexes as suggested by the website.
 
 I have tried increasing the effective_cache_size and reducing the 
 random_page_cost to try and force the optimiser to use the index but 
 it always uses the sequential scan.
 
 What is the best way to force the use of indexes in these queries? 
 Currently testing with version 8.1.4.

Well, you don't want to be forcing it if possible. Ideally, PG should be

able to figure out what to use itself.

In the case of query0 and query1 as shown on your web-page I'd expect a 
sequential scan of dim0 then access via the index on fact0. Reasons why 
this might not be happening include:
1. Inaccurate stats - ANALYSE your tables
2. Insufficient memory for sorting etc - issue SET work_mem=XXX before 
the query and try increased values.
3. Other parameters are out-of-whack. For example, effective_cache_size 
doesn't change how much cache PG uses, it tells PG how much the O.S. 
will cache. You might find http://www.powerpostgresql.com/PerfList is a 
good quick introduction.


So - ANALYSE your tables
http://www.postgresql.org/docs/8.1/static/sql-analyze.html

Then post EXPLAIN ANALYSE for the queries and we'll see what they're
doing.
-- 
   Richard Huxton
   Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend