Rick Schumeyer wrote:
I have two index questions. The first is about an issue that has been
recently discussed,
and I just wanted to be sure of my understanding. Functions like
count(), max(), etc. will
use sequential scans instead of index scans because the index doesn’t
know which rows
are actually visible…is this correct?
Actually, index scans are chosen whenever the cost is expected to be
cheaper than a sequential scan. This is generally about < 10% of the
total number of rows.
Second:
I created an index in a table with over 10 million rows.
The index is on field x, which is a double.
The following command, as I expected, results in an index scan:
=# explain select * from data where x = 0;
QUERY PLAN
-------------------------------------------------------------------------
Index Scan using data_x_ix on data (cost=0.00..78.25 rows=19 width=34)
Index Cond: (x = 0::double precision)
(2 rows)
Since you have 10m rows, when it expects to get only 19 rows, it is much
faster to use an index.
But this command, in which the only difference if > instead of =, is a
sequential scan.
=# explain select * from data where x > 0;
QUERY PLAN
------------------------------------------------------------------
Seq Scan on data (cost=0.00..1722605.20 rows=62350411 width=34)
Filter: (x > 0::double precision)
(2 rows)
Here, pg expects to find 62M rows (you must have significantly more than
10M rows). In this case a sequential scan is much faster than an indexed
one, so that's what pg does.
Why is this?
(This is with pg 8.0.1 on a PC running FC3 with 1GB ram…if it matters)
If you think there is truly a performance problem, try attaching the
results of "explain analyze" in which we might be able to tell you that
your statistics inaccurate (run vacuum analyze if you haven't).
John
=:->
signature.asc
Description: OpenPGP digital signature