On 16 Aug 2011, at 15:09, Tom Lane wrote: > adam_pgsql <adam_pg...@witneyweb.org> writes: >> I have a query hitting a table of 25 million rows. The table has a >> text field ('identifier') which i need to query for matching rows. The >> question is if i have multiple strings to match against this field I >> can use multiple OR sub-statements or multiple statements in a >> UNION. The UNION seems to run quicker.... is this to be expected? > > Your test cases don't seem exactly comparable; in particular I think the > second one is benefiting from the first one having already read and > cached the relevant disk blocks. Notice how you've got, eg, > >> -> Bitmap Index Scan on in_dba_data_base_identifier >> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318 >> loops=1) >> Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 >> (6f24)'::character varying) > > versus > >> -> Bitmap Index Scan on >> in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) (actual >> time=0.178..0.178 rows=318 loops=1) >> Index Cond: (lower(identifier) ~=~ >> 'sptigr4-2210 (6f24)'::character varying) > > Those are the exact same subplan, so any honest comparison should be > finding them to take the same amount of time. When the actual readings > are different by a factor of several hundred, there's something wrong > with your measurement process. > > In the end this comes down to whether duplicates will be eliminated more > efficiently by a BitmapOr step or by sort/uniq on the resulting rows. > I'd have to bet on the BitmapOr myself, but it's likely that this is > down in the noise compared to the actual disk accesses in any > not-fully-cached scenario. Also, if you don't expect the sub-statements > to yield any duplicates, or don't care about seeing the same row twice > in the output, you should consider UNION ALL instead of UNION.
Thanks guys, I'll give some of those options a try and see which ones improve performance (Tom, yes i ran those queries after each other so there was caching going on. However, I had noticed a difference in performance when spacing the queries before and after a few other big queries to help clear the cache). adam -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql