Re: [GENERAL] Subplan and index usage

2008-03-13 Thread Tom Lane
Vyacheslav Kalinin [EMAIL PROTECTED] writes:
 [ poor estimation for ]

 select *
   from cities
  where  ( ficity_id in (
select ficity_id from cities_name_words
 where fsword like 'novgorod%'
 union
select ficity_id from cities_name_ru_words
 where fsword like 'novgorod%'
   )
   or lower(fsname) like 'novgorod%'
   or lower(fsname_ru) like 'novgorod%'
  )

When you have an IN at the top level of WHERE, it's flattened into a
kind of JOIN, and there's reasonably decent estimation of the
selectivity.  Unfortunately, when it's down inside an OR-clause like
this, all those smarts go out the window and it's just treated as a
generic subplan condition, with 0.5 estimated selectivity IIRC.
Improving that would be nice but it's not high on anyone's to-do list.

You might get better results if you combine three separate queries
with UNION.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Subplan and index usage

2008-03-12 Thread Vyacheslav Kalinin
Consider the following case which is almost exact snapshot of part of our
scheme:

   Table cities
Column|  Type  | Modifiers | Description
--++---+-
 ficity_id| integer| not null  |
 ficountry_id | integer|   |
 firegion_id  | integer|   |
 fsname   | character varying(100) |   |
 fsname_ru| character varying(200) |   |
Indexes:
pk_geocities PRIMARY KEY, btree (ficity_id)
idx_cities_name btree (lower(fsname::text) varchar_pattern_ops)
idx_cities_name_ru btree (lower(fsname_ru::text) varchar_pattern_ops)
idx_geocities_country_id btree (ficountry_id)
idx_geocities_region_id btree (firegion_id)
Foreign-key constraints:
fk_geocities_country_id FOREIGN KEY (ficountry_id) REFERENCES
countries(ficountry_id) ON UPDATE CASCADE ON DELETE CASCADE
fk_geocities_region_id FOREIGN KEY (firegion_id) REFERENCES
regions(firegion_id) ON UPDATE CASCADE ON DELETE CASCADE


Table cities_name_words
  Column   |  Type  | Modifiers | Description
---++---+-
 ficity_id | integer| not null  |
 fsword| character varying(200) | not null  |
Indexes:
idx_cities_name_words_city_id btree (ficity_id)
idx_cities_name_words_word btree (fsword varchar_pattern_ops)
Foreign-key constraints:
fk_cities_name_words_city_id FOREIGN KEY (ficity_id) REFERENCES
cities(ficity_id) ON UPDATE CASCADE ON DELETE CASCADE


Table cities_name_ru_words
  Column   |  Type  | Modifiers | Description
---++---+-
 ficity_id | integer| not null  |
 fsword| character varying(200) | not null  |
Indexes:
idx_cities_name_ru_words_city_id btree (ficity_id)
idx_cities_name_ru_words_word btree (fsword varchar_pattern_ops)
Foreign-key constraints:
fk_cities_name_ru_words_city_id FOREIGN KEY (ficity_id) REFERENCES
cities(ficity_id) ON UPDATE CASCADE ON DELETE CASCADE

This is the part of geo location database. The purpose of cities_name_words
and cities_name_ru_words is to facilitate indexing on separate words in city
name - they contain words of fsname or fsname_ru respectively of the
corresponding record in cities if it has more than word. Cities has about
19 records, cities_name_words about 8 and cities_name_ru_words about
5000. Now the query for city by name looks like this:

select *
  from cities
 where  ( ficity_id in (
   select ficity_id from cities_name_words
where fsword like 'novgorod%'
union
   select ficity_id from cities_name_ru_words
where fsword like 'novgorod%'
  )
  or lower(fsname) like 'novgorod%'
  or lower(fsname_ru) like 'novgorod%'
 )

QUERY PLAN
Seq Scan on cities  (cost=16.63..5949.26 rows=95014 width=60)
  Filter: ((hashed subplan) OR (lower((fsname)::text) ~~ 'novgorod%'::text)
OR (lower((fsname_ru)::text) ~~ 'novgorod%'::text))
  SubPlan
-  Unique  (cost=16.61..16.62 rows=2 width=4)
  -  Sort  (cost=16.61..16.62 rows=2 width=4)
Sort Key: cities_name_words.ficity_id
-  Append  (cost=0.00..16.60 rows=2 width=4)
  -  Index Scan using idx_cities_name_words_word on
cities_name_words  (cost=0.00..8.31 rows=1 width=4)
Index Cond: (((fsword)::text ~=~
'novgorod'::text) AND ((fsword)::text ~~ 'novgoroe'::text))
Filter: ((fsword)::text ~~ 'novgorod%'::text)
  -  Index Scan using idx_cities_name_ru_words_word on
cities_name_ru_words  (cost=0.00..8.27 rows=1 width=4)
Index Cond: (((fsword)::text ~=~
'novgorod'::text) AND ((fsword)::text ~~ 'novgoroe'::text))
Filter: ((fsword)::text ~~ 'novgorod%'::text)

Notice how it uses proper indexes in subplan and goes for sequence scan on
the main table. If the where- conditions are applied separately it uses
indexes as expected:

select *
  from cities
 where  ( lower(fsname) like 'novgorod%'
  or lower(fsname_ru) like 'novgorod%'
 )

QUERY PLAN
Bitmap Heap Scan on cities  (cost=8.57..12.59 rows=1 width=60)
  Recheck Cond: ((lower((fsname)::text) ~~ 'novgorod%'::text) OR
(lower((fsname_ru)::text) ~~ 'novgorod%'::text))
  Filter: ((lower((fsname)::text) ~~ 'novgorod%'::text) OR
(lower((fsname_ru)::text) ~~ 'novgorod%'::text))
  -  BitmapOr  (cost=8.57..8.57 rows=1 width=0)
-  Bitmap Index Scan on idx_cities_name  (cost=0.00..4.29 rows=1
width=0)
  Index Cond: ((lower((fsname)::text) ~=~ 'novgorod'::text) AND
(lower((fsname)::text) ~~ 'novgoroe'::text))
-  Bitmap Index Scan on idx_cities_name_ru  (cost=0.00..4.28 rows=1
width=0)