> -----Original Message----- > From: adam_pgsql [mailto:adam_pg...@witneyweb.org] > Sent: Tuesday, August 16, 2011 7:39 AM > To: pgsql-sql > Subject: which is better: using OR clauses or UNION? > > > Hi, > > I have a query hitting a table of 25 million rows. The table has a text > field ('identifier') which i need to query for matching rows. The > question is if i have multiple strings to match against this field I > can use multiple OR sub-statements or multiple statements in a UNION. > The UNION seems to run quicker.... is this to be expected? or is there > anything else I can do improve the speed of this query? Some query > details: > > > table "dba_data_base", index: > "in_dba_data_base_identifier" btree (lower(identifier) > varchar_pattern_ops) > > > Query 1 > ------- > datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, > log_ratio, p_value > FROM > dba_data_base a > WHERE > ( lower(identifier) LIKE lower('BUGS0000001884677') OR > lower(identifier) LIKE lower('BUGS0000001884678') OR > lower(identifier) LIKE lower('BUGS0000001884679') OR > lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') ) > ORDER BY a.identifier; > > QUERY PLAN > > ----------------------------------------------------------------------- > ----------------------------------------------------------------------- > ---------------------------------------------------------- > -------------------------------- > Sort (cost=14930.13..14939.77 rows=3857 width=62) (actual > time=3208.466..3208.652 rows=318 loops=1) > Sort Key: identifier > -> Bitmap Heap Scan on dba_data_base a (cost=134.43..14700.38 > rows=3857 width=62) (actual time=81.106..3207.721 rows=318 loops=1) > Recheck Cond: ((lower(identifier) ~~ 'bugs0000001884677'::text) > OR (lower(identifier) ~~ 'bugs0000001884678'::text) OR > (lower(identifier) ~~ 'bugs0000001884679'::text) OR (lower(identifier) > ~ > ~ 'sptigr4-2210 (6f24)'::text)) > Filter: ((lower(identifier) ~~ 'bugs0000001884677'::text) OR > (lower(identifier) ~~ 'bugs0000001884678'::text) OR (lower(identifier) > ~~ 'bugs0000001884679'::text) OR (lower(identifier) ~~ 'spt > igr4-2210 (6f24)'::text)) > -> BitmapOr (cost=134.43..134.43 rows=3857 width=0) (actual > time=71.397..71.397 rows=0 loops=1) > -> Bitmap Index Scan on in_dba_data_base_identifier > (cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0 > loops=1) > Index Cond: (lower(identifier) ~=~ > 'bugs0000001884677'::character varying) > -> Bitmap Index Scan on in_dba_data_base_identifier > (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 > loops=1) > Index Cond: (lower(identifier) ~=~ > 'bugs0000001884678'::character varying) > -> Bitmap Index Scan on in_dba_data_base_identifier > (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 > loops=1) > Index Cond: (lower(identifier) ~=~ > 'bugs0000001884679'::character varying) > -> Bitmap Index Scan on in_dba_data_base_identifier > (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 > rows=318 loops=1) > Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 > (6f24)'::character varying) > Total runtime: 3208.904 ms > > > Query 2 > ------- > datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, > log_ratio, p_value > FROM > dba_data_base a > WHERE > lower(identifier) LIKE lower('BUGS0000001884677') > UNION > SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value > FROM > dba_data_base a > WHERE > lower(identifier) LIKE lower('BUGS0000001884678') > UNION > SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value > FROM > dba_data_base a > WHERE > lower(identifier) LIKE lower('BUGS0000001884679') > UNION > SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value > FROM > dba_data_base a > WHERE > lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') > ORDER BY identifier; > > Sort (cost=15702.26..15711.90 rows=3856 width=62) (actual > time=3.688..3.886 rows=317 loops=1) > Sort Key: identifier > -> Unique (cost=15414.74..15472.58 rows=3856 width=62) (actual > time=2.663..3.387 rows=317 loops=1) > -> Sort (cost=15414.74..15424.38 rows=3856 width=62) (actual > time=2.660..2.834 rows=318 loops=1) > Sort Key: bioassay_id, identifier, ratio, log_ratio, > p_value > -> Append (cost=32.88..15185.06 rows=3856 width=62) > (actual time=0.320..2.131 rows=318 loops=1) > -> Bitmap Heap Scan on dba_data_base a > (cost=32.88..3786.62 rows=964 width=62) (actual time=0.041..0.041 > rows=0 loops=1) > Filter: (lower(identifier) ~~ > 'bugs0000001884677'::text) > -> Bitmap Index Scan on > in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) > (actual time=0.036..0.036 rows=0 loops=1) > Index Cond: (lower(identifier) ~=~ > 'bugs0000001884677'::character varying) > -> Bitmap Heap Scan on dba_data_base a > (cost=32.88..3786.62 rows=964 width=62) (actual time=0.010..0.010 > rows=0 loops=1) > Filter: (lower(identifier) ~~ > 'bugs0000001884678'::text) > -> Bitmap Index Scan on > in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) > (actual time=0.008..0.008 rows=0 loops=1) > Index Cond: (lower(identifier) ~=~ > 'bugs0000001884678'::character varying) > -> Bitmap Heap Scan on dba_data_base a > (cost=32.88..3786.62 rows=964 width=62) (actual time=0.010..0.010 > rows=0 loops=1) > Filter: (lower(identifier) ~~ > 'bugs0000001884679'::text) > -> Bitmap Index Scan on > in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) > (actual time=0.008..0.008 rows=0 loops=1) > Index Cond: (lower(identifier) ~=~ > 'bugs0000001884679'::character varying) > -> Bitmap Heap Scan on dba_data_base a > (cost=32.88..3786.62 rows=964 width=62) (actual time=0.255..1.676 > rows=318 loops=1) > Filter: (lower(identifier) ~~ 'sptigr4-2210 > (6f24)'::text) > -> Bitmap Index Scan on > in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) > (actual time=0.178..0.178 rows=318 loops=1) > Index Cond: (lower(identifier) ~=~ > 'sptigr4-2210 (6f24)'::character varying) > Total runtime: 4.174 ms > > Also which should scale better if I add more strings to match? would > there be any better design patterns for this problem? > > Thanks for any help > > Adam > > select version(); > version > ---------------------------------------------------------------- > PostgreSQL 8.2.12 on i686-pc-linux-gnu, compiled by GCC 2.95.4 >
Adam, There is something strange in your 2 execution plans. Exactly the same operation: -- first plan Bitmap Index Scan on in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318 loops=1) Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 (6f24)'::character varying) -- second plan Bitmap Index Scan on in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) (actual time=0.178..0.178 rows=318 loops=1) Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 (6f24)'::character varying) takes quite different time: 71.347 versus 0.178 and basically makes all the difference between duration of your first and second statement. I think, what you are seeing here is data being cached in memory (when you executed "union" statement after "or" statement). Other than that, looking at 2 execution plans, I'd say that in general "or" should run faster than "union", at least because it does "Bitmap Heap Scan on dba_data_base" only once, while "union" statement does this heap scan 4 times (once per "unionized" select). HTH, Igor Neyman -- Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql