Sorry I should have written that we do VACUUM VERBOSE ANALYZE every
night.

- Jeremy

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bill Moran
Sent: Monday, April 12, 2004 12:09 PM
To: [EMAIL PROTECTED]
Cc: Postgresql Performance
Subject: Re: [PERFORM] index v. seqscan for certain values


Quick bit of input, since you didn't mention it.

How often do you run ANALYZE?  I found it interesting that a database I
was doing tests on sped up by a factor of 20 after ANALYZE.  If your
data changes a lot, you should probably schedule ANALYZE to run with
VACUUM.

Jeremy Dunn wrote:
> I've searched the archives and can't find an answer to this seemingly
> simple question.  Apologies if it's too common.
>  
> The table in question has ~1.3M rows.  It has 85 columns, 5 of which
> have single-column indexes.
>  
> The column in question (CID) has 183 distinct values.  For these 
> values,
> the largest has ~38,000 rows, and the smallest has 1 row.  About 30 
> values have < 100 rows, and about 10 values have > 20,000 rows.
>  
> The database is 7.2.3 running on RedHat 7.1. (we are in process of 
> upgrading to PG 7.4.2)    All of the query plan options are enabled,
and 
> the cpu costs are set to the default values. ( cpu_tuple_cost is 0.01,
> cpu_index_tuple_cost is 0.001).  The database is VACUUM'd every night.
>  
> The problem:
> A simply query:
>     select count(*) from xxx where CID=<smalval>
> where <smalval> is a CID value which has relatively few rows, returns 
> a
> plan using the index on that column.
>  
>    explain analyze select count(*) from xxx where cid=869366;
>    Aggregate  (cost=19136.33..19136.33 rows=1 width=0) (actual
> time=78.49..78.49 rows=1 loops=1)
>      ->  Index Scan using xxx_cid on emailrcpts  (cost=0.00..19122.21 
> rows=5648 width=0) (actual time=63.40..78.46 rows=1 loops=1)
>    Total runtime: 78.69 msec
>  
> The same plan is true for values which have up to about 20,000 rows:
>  
>    explain analyze select count(*) from xxx where cid=6223341;
>    Aggregate  (cost=74384.19..74384.19 rows=1 width=0) (actual
> time=11614.89..11614.89 rows=1 loops=1)
>      ->  Index Scan using xxx_cid on emailrcpts  (cost=0.00..74329.26 
> rows=21974 width=0) (actual time=35.75..11582.10 rows=20114 loops=1)
>    Total runtime: 11615.05 msec
> However for the values that have > 20,000 rows, the plan changes to a 
> sequential scan, which is proportionately much slower.
>  
>    explain analyze select count(*) from xxx where cid=7191032;
>    Aggregate  (cost=97357.61..97357.61 rows=1 width=0) (actual
> time=46427.81..46427.82 rows=1 loops=1)
>     ->   Seq Scan on xxx (cost=0.00..97230.62 rows=50792 width=0) 
> (actual time=9104.45..46370.27 rows=37765 loops=1)
>     Total runtime: 46428.00 msec
>  
>  
> The question: why does the planner consider a sequential scan to be
> better for these top 10 values?  In terms of elapsed time it is more 
> than twice as slow, proportionate to an index scan for the same number

> of rows.
>  
> What I tried:
>  
> A) alter table xxx alter column cid set statistics 500;   
>     analyze xxx;
> This does not affect the results.
>  
> B)  dropped/rebuilt the index, with no improvement.
>  
> C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no
> success
>  
> D) force an index scan for the larger values by using a very high 
> value
> for cpu_tuple_cost (e.g. .5) but this doesn't seem like a wise thing
to do.
>  
> Your thoughts appreciated in advance!
>  
> - Jeremy
>  
> 7+ years experience in Oracle performance-tuning
> relatively new to postgresql


-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Reply via email to