[SQL] Indexes and statistics

David Witham Tue, 17 Feb 2004 20:44:33 -0800

Hi all,

This is a further post from last week. I've got a table of phone call detail records.


buns=# select count(*) from cdr;                         
  count  
---------
 2800653
(1 row)

One of the customers is quite large (8.3% of the records):

buns=# select count(*) from cdr where cust_id = 99201110;        
 count  
--------
 231889
(1 row)

I have indexes on cust_id (integer) and bill_id (integer). If I try to do a query on 
that customer it doesn't use the index:

buns=# explain analyse select count(*) from cdr where cust_id = 99201110 and bill_id 
is null;
                                                   QUERY PLAN                          
                          
-----------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=87082.81..87082.81 rows=1 width=0) (actual time=82279.63..82279.63 
rows=1 loops=1)
   ->  Seq Scan on cdr  (cost=0.00..87037.71 rows=18041 width=0) (actual 
time=82279.61..82279.61 rows=0 loops=1)
         Filter: ((cust_id = 99201110) AND (bill_id IS NULL))
 Total runtime: 82280.19 msec
(4 rows)

I tried this:

alter table cdr alter column cust_id set statistics 1000;
alter table cdr alter column bill_id set statistics 1000;
analyze verbose;

The I ran the query again but I still got the same result.

Then I tried disabling sequential scans in postgresql.conf, restarted the postmaster 
and did the query again:

buns=# explain analyse select count(*) from cdr where cust_id = 99201110 and bill_id 
is null;
                                                            QUERY PLAN                 
                                           
----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=913498.60..913498.60 rows=1 width=0) (actual time=48387.91..48387.91 
rows=1 loops=1)
   ->  Index Scan using cdr_ix3 on cdr  (cost=0.00..913453.49 rows=18041 width=0) 
(actual time=48387.89..48387.89 rows=0 loops=1)
         Index Cond: (cust_id = 99201110)
         Filter: (bill_id IS NULL)
 Total runtime: 48388.47 msec
(5 rows)

The computed cost of using the index was a factor of 10 higher which I presume is why 
the query planner wasn't using the index, but it ran in half the time. So I guess I 
need to know how to alter the statistics collection so that the index will get used. I 
gather that index columns that occur in more than "a few" percent of the table cause 
the query planner to not use the index. Does that mean I won't be able to get the 
query planner to ever use the cust_id index for that customer or can I tune some 
parameters to alter that?

Any suggestions appreciated.

Thanks,
David

David Witham
Telephony Platforms Architect
Unidial, Australia



---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[SQL] Indexes and statistics

Reply via email to