Dear All,

I have a table with approximately 570k Rows.


                  Table "filter.rules"
  Column  |          Type          |               Modifiers
----------+------------------------+----------------------------------------
 rulename | character varying(16)  | not null default ''::character varying
 uri      | character varying(200) | not null default ''::character varying
 redirect | character varying(200) | not null default ''::character varying
 moddate  | date                   | not null default ('now'::text)::date
 active   | boolean                | not null default true
 comment  | character varying(255) | not null default ''::character varying
Indexes:
    "rules_pkey" primary key, btree (rulename, uri)
    "moddate_idx" btree (moddate)
    "rules_idx" btree (lower((uri)::text))

Statistic on the uri column have been set 1000
Vacuum full and analyze was run before tests, no alteration to tables since then.

# analyze verbose filter.rules;
INFO:  analyzing "filter.rules"
INFO:  "rules": 5228 pages, 300000 rows sampled, 570533 estimated total rows
ANALYZE

# explain analyze SELECT rulename, redirect from filter.rules WHERE lower(uri) 
IN(lower('land.com'),lower('com'),lower('land.com/'),lower('com/')) GROUP BY 
rulename,redirect;
                                                                                       
    QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=22352.79..22352.79 rows=1 width=12) (actual 
time=2047.331..2047.332 rows=1 loops=1)
   ->  Seq Scan on rules  (cost=0.00..22296.32 rows=11294 width=12) (actual 
time=540.149..2047.308 rows=1 loops=1)
         Filter: ((lower((uri)::text) = 'land.com'::text) OR (lower((uri)::text) = 
'com'::text) OR (lower((uri)::text) = 'land.com/'::text) OR (lower((uri)::text) = 
'com/'::text))
 Total runtime: 2047.420 ms
(4 rows)

# SET enable_seqscan=off;

# explain analyze SELECT rulename, redirect from filter.rules WHERE lower(uri) 
IN(lower('land.com'),lower('com'),lower('land.com/'),lower('com/')) GROUP BY 
rulename,redirect;
                                                                                       
    QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=38970.68..38970.68 rows=1 width=12) (actual time=0.328..0.328 
rows=1 loops=1)
   ->  Index Scan using rules_idx, rules_idx, rules_idx, rules_idx on rules  
(cost=0.00..38914.21 rows=11294 width=12) (actual time=0.210..0.312 rows=1 loops=1)
         Index Cond: ((lower((uri)::text) = 'land.com'::text) OR (lower((uri)::text) = 
'com'::text) OR (lower((uri)::text) = 'land.com/'::text) OR (lower((uri)::text) = 
'com/'::text))
 Total runtime: 0.700 ms
(4 rows)

Could anybody offer explanations of why the planner does such a terrible job of 
estimated the number of rows for this query, with the stats set so high.
Tests were also done with stats set to 100, and 1.  The results are exactly the same.  
Which I would have assumed.

Also I am interested in how functional indexes have statistics collected for them, if 
they do.  As to possibly minimize or avoid this problem in the future.

Thanks for your considersation of this matter.

Regards

Russell Smith.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to