[PERFORM] Postgresql is using seqscan when is should use indexes.

2004-11-08 Thread Andreas Åkre Solberg
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

We have two tables, dst_port_hour and dst_port_day, which should be  
very similar, they both have about 50.000.000 rows. In both tables we  
have an index for period_id.

We run postgresql  7.4.5 on a dedicated Debian server, with dual Intel  
Xeon 3GHz and 4GB memory.

The problem is that on the dst_port_day table, postgresql is using  
seqscan, and not the index when it should. Forcing the use of the index  
by setting enable_seqscan to false, makes the query lighthening fast.  
When using seqscan, the query takes several minutes. The planner  
calculates the cost for Index scan to be much more than sequence scan.

Why is our query planner misbehaving?

Here are the exaplain analyze output with and without index-force:


SET enable_seqscan=false;

stager=  explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE  
cur.period_id='2779' GROUP BY cur.portnr  ORDER BY SUM(cur.octets) DESC  
  LIMIT 5;
  
QUERY PLAN
-  
-  
- -
  Limit  (cost=2022664.62..2022664.63 rows=5 width=12) (actual  
time=831.772..831.816 rows=5 loops=1)
-  Sort  (cost=2022664.62..2022664.82 rows=80 width=12) (actual  
time=831.761..831.774 rows=5 loops=1)
  Sort Key: sum(octets)
  -  HashAggregate  (cost=2022661.89..2022662.09 rows=80  
width=12) (actual time=587.036..663.991 rows=16396 loops=1)
-  Index Scan using dst_port_day_period_id_key on  
dst_port_day cur  (cost=0.00..2019931.14 rows=546150 width=12) (actual  
time=0.038..303.801 rows=48072 loops=1)
  Index Cond: (period_id = 2779)
  Total runtime: 836.362 ms
(7 rows)



SET enable_seqscan=true;

stager=   explain analyze SELECT cur.portnr FROM dst_port_day cur  
WHERE cur.period_id='2779' GROUP BY cur.portnr  ORDER BY  
SUM(cur.octets) DESC  LIMIT 5;

QUERY PLAN
-  
-  
- --
  Limit  (cost=1209426.88..1209426.89 rows=5 width=12) (actual  
time=299053.006..299053.053 rows=5 loops=1)
-  Sort  (cost=1209426.88..1209427.08 rows=80 width=12) (actual  
time=299052.995..299053.008 rows=5 loops=1)
  Sort Key: sum(octets)
  -  HashAggregate  (cost=1209424.15..1209424.35 rows=80  
width=12) (actual time=298803.273..298881.020 rows=16396 loops=1)
-  Seq Scan on dst_port_day cur  (cost=0.00..1206693.40  
rows=546150 width=12) (actual time=298299.508..298526.544 rows=48072  
loops=1)
  Filter: (period_id = 2779)
  Total runtime: 299057.643 ms
(7 rows)

- -- 
Andreas kre Solberg, UNINETT AS Testnett
Contact info and Public PGP Key available on:
http://andreas.solweb.no/?account=Work

-BEGIN PGP SIGNATURE-
Version: PGP 8.1
Comment: My public key is available at http://andreas.solweb.no

iQA/AwUBQY9NBPyFPYEtpdl2EQKIcwCgpPEkZ3PQKWNf6JWP6tQ4eFBPEngAoKTT
4eGkB0NVyIg0surd1LJdFD7+
=bYtH
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Postgresql is using seqscan when is should use indexes.

2004-11-08 Thread Russell Smith
On Mon, 8 Nov 2004 09:40 pm, Andreas kre Solberg wrote:
 We have two tables, dst_port_hour and dst_port_day, which should be
 very similar, they both have about 50.000.000 rows. In both tables we
 have an index for period_id.
 
 We run postgresql  7.4.5 on a dedicated Debian server, with dual Intel
 Xeon 3GHz and 4GB memory.
 
 The problem is that on the dst_port_day table, postgresql is using
 seqscan, and not the index when it should. Forcing the use of the index
 by setting enable_seqscan to false, makes the query lighthening fast.
 When using seqscan, the query takes several minutes. The planner
 calculates the cost for Index scan to be much more than sequence scan.
 
 Why is our query planner misbehaving?
 
 Here are the exaplain analyze output with and without index-force:
 
 
 SET enable_seqscan=false;
 
 stager=  explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE
 cur.period_id='2779' GROUP BY cur.portnr  ORDER BY SUM(cur.octets) DESC
   LIMIT 5;
 
dst_port_day cur  (cost=0.00..2019931.14 rows=546150 width=12) (actual 
time=0.038..303.801 rows=48072 loops=1)

The guess of the number of rows returned by the index scan is out by a factor 
of 10.  500k rows is greater than 1% of
the rows, so I think the planner is likely to choose a sequence scan at this 
amount, unless you have tuned things like
random page cost.

What is the selectivity like on that column?
Have you analyzed recently?

If so, you should probably increase the statistics on that column
See ALTER TABLE SET STATISTICS in the manual.

 QUERY PLAN
 -
   Limit  (cost=2022664.62..2022664.63 rows=5 width=12) (actual 
 time=831.772..831.816 rows=5 loops=1)
 -  Sort  (cost=2022664.62..2022664.82 rows=80 width=12) (actual 
 time=831.761..831.774 rows=5 loops=1)
   Sort Key: sum(octets)
   -  HashAggregate  (cost=2022661.89..2022662.09 rows=80 width=12) 
 (actual time=587.036..663.991 rows=16396 loops=1)
 -  Index Scan using dst_port_day_period_id_key on 
 dst_port_day cur  (cost=0.00..2019931.14 rows=546150 width=12) (actual 
 time=0.038..303.801 rows=48072 loops=1)
   Index Cond: (period_id = 2779)
   Total runtime: 836.362 ms
 (7 rows)
 
 
 
 SET enable_seqscan=true;
 
 stager=   explain analyze SELECT cur.portnr FROM dst_port_day cur WHERE 
 cur.period_id='2779' GROUP BY cur.portnr  ORDER BY SUM(cur.octets) DESC  
 LIMIT 5;
 
 QUERY PLAN
 --
   Limit  (cost=1209426.88..1209426.89 rows=5 width=12) (actual 
 time=299053.006..299053.053 rows=5 loops=1)
 -  Sort  (cost=1209426.88..1209427.08 rows=80 width=12) (actual 
 time=299052.995..299053.008 rows=5 loops=1)
   Sort Key: sum(octets)
   -  HashAggregate  (cost=1209424.15..1209424.35 rows=80 width=12) 
 (actual time=298803.273..298881.020 rows=16396 loops=1)
 -  Seq Scan on dst_port_day cur  (cost=0.00..1206693.40 
 rows=546150 width=12) (actual time=298299.508..298526.544 rows=48072 loops=1)
   Filter: (period_id = 2779)
   Total runtime: 299057.643 ms
 (7 rows)
 

Regards

Russell Smith

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly