Hi,

I have a query hitting a table of 25 million rows. The table has a text field 
('identifier') which i need to query for matching rows. The question is if i 
have multiple strings to match against this field I can use multiple OR 
sub-statements or multiple statements in a UNION. The UNION seems to run 
quicker.... is this to be expected? or is there anything else I can do improve 
the speed of this query? Some query details:


table "dba_data_base", index:
"in_dba_data_base_identifier" btree (lower(identifier) varchar_pattern_ops)


Query 1
-------
datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio, 
p_value
  FROM
      dba_data_base a
  WHERE
      ( lower(identifier) LIKE lower('BUGS0000001884677') OR
        lower(identifier) LIKE lower('BUGS0000001884678') OR
        lower(identifier) LIKE lower('BUGS0000001884679') OR
        lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
  ORDER BY a.identifier;
                                                                                
                               QUERY PLAN                                       
                                        

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------
 Sort  (cost=14930.13..14939.77 rows=3857 width=62) (actual 
time=3208.466..3208.652 rows=318 loops=1)
   Sort Key: identifier
   ->  Bitmap Heap Scan on dba_data_base a  (cost=134.43..14700.38 rows=3857 
width=62) (actual time=81.106..3207.721 rows=318 loops=1)
         Recheck Cond: ((lower(identifier) ~~ 'bugs0000001884677'::text) OR 
(lower(identifier) ~~ 'bugs0000001884678'::text) OR (lower(identifier) ~~ 
'bugs0000001884679'::text) OR (lower(identifier) ~
~ 'sptigr4-2210 (6f24)'::text))
         Filter: ((lower(identifier) ~~ 'bugs0000001884677'::text) OR 
(lower(identifier) ~~ 'bugs0000001884678'::text) OR (lower(identifier) ~~ 
'bugs0000001884679'::text) OR (lower(identifier) ~~ 'spt
igr4-2210 (6f24)'::text))
         ->  BitmapOr  (cost=134.43..134.43 rows=3857 width=0) (actual 
time=71.397..71.397 rows=0 loops=1)
               ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0 loops=1)
                     Index Cond: (lower(identifier) ~=~ 
'bugs0000001884677'::character varying)
               ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
                     Index Cond: (lower(identifier) ~=~ 
'bugs0000001884678'::character varying)
               ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
                     Index Cond: (lower(identifier) ~=~ 
'bugs0000001884679'::character varying)
               ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318 
loops=1)
                     Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 
(6f24)'::character varying)
 Total runtime: 3208.904 ms


Query 2
-------
datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio, 
p_value
  FROM
      dba_data_base a
  WHERE
      lower(identifier) LIKE lower('BUGS0000001884677')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
  FROM
      dba_data_base a
  WHERE
        lower(identifier) LIKE lower('BUGS0000001884678')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
  FROM
      dba_data_base a
  WHERE
        lower(identifier) LIKE lower('BUGS0000001884679')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
  FROM
      dba_data_base a
  WHERE
        lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
  ORDER BY identifier;

 Sort  (cost=15702.26..15711.90 rows=3856 width=62) (actual time=3.688..3.886 
rows=317 loops=1)
   Sort Key: identifier
   ->  Unique  (cost=15414.74..15472.58 rows=3856 width=62) (actual 
time=2.663..3.387 rows=317 loops=1)
         ->  Sort  (cost=15414.74..15424.38 rows=3856 width=62) (actual 
time=2.660..2.834 rows=318 loops=1)
               Sort Key: bioassay_id, identifier, ratio, log_ratio, p_value
               ->  Append  (cost=32.88..15185.06 rows=3856 width=62) (actual 
time=0.320..2.131 rows=318 loops=1)
                     ->  Bitmap Heap Scan on dba_data_base a  
(cost=32.88..3786.62 rows=964 width=62) (actual time=0.041..0.041 rows=0 
loops=1)
                           Filter: (lower(identifier) ~~ 
'bugs0000001884677'::text)
                           ->  Bitmap Index Scan on in_dba_data_base_identifier 
 (cost=0.00..32.64 rows=964 width=0) (actual time=0.036..0.036 rows=0 loops=1)
                                 Index Cond: (lower(identifier) ~=~ 
'bugs0000001884677'::character varying)
                     ->  Bitmap Heap Scan on dba_data_base a  
(cost=32.88..3786.62 rows=964 width=62) (actual time=0.010..0.010 rows=0 
loops=1)
                           Filter: (lower(identifier) ~~ 
'bugs0000001884678'::text)
                           ->  Bitmap Index Scan on in_dba_data_base_identifier 
 (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
                                 Index Cond: (lower(identifier) ~=~ 
'bugs0000001884678'::character varying)
                     ->  Bitmap Heap Scan on dba_data_base a  
(cost=32.88..3786.62 rows=964 width=62) (actual time=0.010..0.010 rows=0 
loops=1)
                           Filter: (lower(identifier) ~~ 
'bugs0000001884679'::text)
                           ->  Bitmap Index Scan on in_dba_data_base_identifier 
 (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
                                 Index Cond: (lower(identifier) ~=~ 
'bugs0000001884679'::character varying)
                     ->  Bitmap Heap Scan on dba_data_base a  
(cost=32.88..3786.62 rows=964 width=62) (actual time=0.255..1.676 rows=318 
loops=1)
                           Filter: (lower(identifier) ~~ 'sptigr4-2210 
(6f24)'::text)
                           ->  Bitmap Index Scan on in_dba_data_base_identifier 
 (cost=0.00..32.64 rows=964 width=0) (actual time=0.178..0.178 rows=318 loops=1)
                                 Index Cond: (lower(identifier) ~=~ 
'sptigr4-2210 (6f24)'::character varying)
 Total runtime: 4.174 ms

Also which should scale better if I add more strings to match? would there be 
any better design patterns for this problem?

Thanks for any help

Adam

select version();
                            version                             
----------------------------------------------------------------
 PostgreSQL 8.2.12 on i686-pc-linux-gnu, compiled by GCC 2.95.4



-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Reply via email to