Re: [SQL] which is better: using OR clauses or UNION?

Igor Neyman Wed, 17 Aug 2011 13:27:20 -0700

> -----Original Message-----
> From: adam_pgsql [mailto:[email protected]]
> Sent: Tuesday, August 16, 2011 11:56 AM
> To: Tom Lane
> Cc: pgsql-sql
> Subject: Re: which is better: using OR clauses or UNION?
> 
> 
> On 16 Aug 2011, at 15:09, Tom Lane wrote:
> 
> > adam_pgsql <[email protected]> writes:
> >> I have a query hitting a table of 25 million rows. The table has a
> >> text field ('identifier') which i need to query for matching rows.
> The
> >> question is if i have multiple strings to match against this field
I
> >> can use multiple OR sub-statements or multiple statements in a
> >> UNION. The UNION seems to run quicker.... is this to be expected?
> >
> > Your test cases don't seem exactly comparable; in particular I think
> the
> > second one is benefiting from the first one having already read and
> > cached the relevant disk blocks.  Notice how you've got, eg,
> >
> >>              ->  Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347
> rows=318 loops=1)
> >>                    Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
> (6f24)'::character varying)
> >
> > versus
> >
> >>                          ->  Bitmap Index Scan on
> in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0)
> (actual time=0.178..0.178 rows=318 loops=1)
> >>                                Index Cond: (lower(identifier) ~=~
> 'sptigr4-2210 (6f24)'::character varying)
> >
> > Those are the exact same subplan, so any honest comparison should be
> > finding them to take the same amount of time.  When the actual
> readings
> > are different by a factor of several hundred, there's something
wrong
> > with your measurement process.
> >
> > In the end this comes down to whether duplicates will be eliminated
> more
> > efficiently by a BitmapOr step or by sort/uniq on the resulting
rows.
> > I'd have to bet on the BitmapOr myself, but it's likely that this is
> > down in the noise compared to the actual disk accesses in any
> > not-fully-cached scenario.  Also, if you don't expect the sub-
> statements
> > to yield any duplicates, or don't care about seeing the same row
> twice
> > in the output, you should consider UNION ALL instead of UNION.
> 
> 
> Thanks guys, I'll give some of those options a try and see which ones
> improve performance
> 
> (Tom, yes i ran those queries after each other so there was caching
> going on. However, I had noticed a difference in performance when
> spacing the queries before and after a few other big queries to help
> clear the cache).
> 
> adam


Adam,

Did you verify that your cache is "cleared"? Like using pg_buffercache
contrib. module?
Besides, there is also OS cache...

Regards,
Igor Neyman


-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] which is better: using OR clauses or UNION?

Reply via email to