I have a table, "path", which is:

 pathid | integer | not null default
nextval('path_pathid_seq'::regclass)
 path   | text    | not null
Indexes:
    "path_pkey" PRIMARY KEY, btree (pathid)
    "path_name_idx" btree (path)

The table contains approx 1.2 million rows, of which all are unique.
(both the path field and the naem field are unique, thought he
path_name_idx index is not a unique index)


On this table, I do a query like:
SELECT * FROM path WHERE path LIKE 'f:/userdirs/s/super_73/%'

The estimate for this query is comlpetely off, which I beleive is the
cause for a very bad selection of a query plan when it's used in a big
join (creating nestloops that ends up taking 15+ minutes to complete..).


Explain analyze gives:
                                                     QUERY PLAN
------------------------------------------------------------------------
-----------------------------------------------
 Index Scan using path_name_idx on path  (cost=0.00..3.24 rows=1
width=74) (actual time=0.035..0.442 rows=214 loops=1)
   Index Cond: ((path >= 'f:/userdirs/s/super'::text) AND (path <
'f:/userdirs/s/supes'::text))
   Filter: (path ~~ 'f:/userdirs/s/super_73%'::text)


No matter what I search on (when it's very selective), the estimate is
always 1 row, whereas the actual value is at least a couple of hundred.
If I try with say "f:/us", the difference is 377,759 estimated vs
562,459 returned, which is percentage-wise a lot less, but...

I have tried upping the statistics target up to 1000, with no changes.

Any way to teach the planner about this?

FYI, if I push the result of the select on path into a temp table, and
then join with that one in my main table, I get a hashjoin instead, and
query time is < 30 seconds instead of 15 minutes...


//Magnus

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to