[SQL] which is better: using OR clauses or UNION?
Hi,
I have a query hitting a table of 25 million rows. The table has a text field
('identifier') which i need to query for matching rows. The question is if i
have multiple strings to match against this field I can use multiple OR
sub-statements or multiple statements in a UNION. The UNION seems to run
quicker is this to be expected? or is there anything else I can do improve
the speed of this query? Some query details:
table "dba_data_base", index:
"in_dba_data_base_identifier" btree (lower(identifier) varchar_pattern_ops)
Query 1
---
datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio,
p_value
FROM
dba_data_base a
WHERE
( lower(identifier) LIKE lower('BUGS001884677') OR
lower(identifier) LIKE lower('BUGS001884678') OR
lower(identifier) LIKE lower('BUGS001884679') OR
lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
ORDER BY a.identifier;
QUERY PLAN
Sort (cost=14930.13..14939.77 rows=3857 width=62) (actual
time=3208.466..3208.652 rows=318 loops=1)
Sort Key: identifier
-> Bitmap Heap Scan on dba_data_base a (cost=134.43..14700.38 rows=3857
width=62) (actual time=81.106..3207.721 rows=318 loops=1)
Recheck Cond: ((lower(identifier) ~~ 'bugs001884677'::text) OR
(lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~
'bugs001884679'::text) OR (lower(identifier) ~
~ 'sptigr4-2210 (6f24)'::text))
Filter: ((lower(identifier) ~~ 'bugs001884677'::text) OR
(lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~
'bugs001884679'::text) OR (lower(identifier) ~~ 'spt
igr4-2210 (6f24)'::text))
-> BitmapOr (cost=134.43..134.43 rows=3857 width=0) (actual
time=71.397..71.397 rows=0 loops=1)
-> Bitmap Index Scan on in_dba_data_base_identifier
(cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0 loops=1)
Index Cond: (lower(identifier) ~=~
'bugs001884677'::character varying)
-> Bitmap Index Scan on in_dba_data_base_identifier
(cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (lower(identifier) ~=~
'bugs001884678'::character varying)
-> Bitmap Index Scan on in_dba_data_base_identifier
(cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (lower(identifier) ~=~
'bugs001884679'::character varying)
-> Bitmap Index Scan on in_dba_data_base_identifier
(cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318
loops=1)
Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
(6f24)'::character varying)
Total runtime: 3208.904 ms
Query 2
---
datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio,
p_value
FROM
dba_data_base a
WHERE
lower(identifier) LIKE lower('BUGS001884677')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
FROM
dba_data_base a
WHERE
lower(identifier) LIKE lower('BUGS001884678')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
FROM
dba_data_base a
WHERE
lower(identifier) LIKE lower('BUGS001884679')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
FROM
dba_data_base a
WHERE
lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
ORDER BY identifier;
Sort (cost=15702.26..15711.90 rows=3856 width=62) (actual time=3.688..3.886
rows=317 loops=1)
Sort Key: identifier
-> Unique (cost=15414.74..15472.58 rows=3856 width=62) (actual
time=2.663..3.387 rows=317 loops=1)
-> Sort (cost=15414.74..15424.38 rows=3856 width=62) (actual
time=2.660..2.834 rows=318 loops=1)
Sort Key: bioassay_id, identifier, ratio, log_ratio, p_value
-> Append (cost=32.88..15185.06 rows=3856 width=62) (actual
time=0.320..2.131 rows=318 loops=1)
-> Bitmap Heap Scan on dba_data_base a
(cost=32.88..3786.62 rows=964 width=62) (actual time=0.041..0.041 rows=0
loops=1)
Filter: (lower(identifier) ~~
'bugs001884677'::text)
-> Bitmap Index Scan on in_dba_data_base_identifier
(cost=0.00..32.64 rows=964 width=0) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: (lower(identifier) ~=~
'bugs001884677'::character varying)
-> Bitmap Heap Scan on dba_data_base a
(cost=3
Re: [SQL] which is better: using OR clauses or UNION?
hi Adam,
im not sure which is faster/slower but, possibly you can speed it up by
using "in" operator
...where lower(identifier) *in *(lower('BUGS001884677')*
,*lower('BUGS001884678')
*,*);
if you create function based index:
CREATE INDEX idx_table_lower_text ON table(*lower*(text_field));
(taken from: http://archives.postgresql.org/pgsql-sql/2003-09/msg00395.php)
On Tue, Aug 16, 2011 at 1:39 PM, adam_pgsql wrote:
>
> Hi,
>
> I have a query hitting a table of 25 million rows. The table has a text
> field ('identifier') which i need to query for matching rows. The question
> is if i have multiple strings to match against this field I can use multiple
> OR sub-statements or multiple statements in a UNION. The UNION seems to run
> quicker is this to be expected? or is there anything else I can do
> improve the speed of this query? Some query details:
>
>
> table "dba_data_base", index:
> "in_dba_data_base_identifier" btree (lower(identifier) varchar_pattern_ops)
>
>
> Query 1
> ---
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio,
> log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> ( lower(identifier) LIKE lower('BUGS001884677') OR
> lower(identifier) LIKE lower('BUGS001884678') OR
> lower(identifier) LIKE lower('BUGS001884679') OR
> lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
> ORDER BY a.identifier;
>
> QUERY PLAN
>
>
>
>
> Sort (cost=14930.13..14939.77 rows=3857 width=62) (actual
> time=3208.466..3208.652 rows=318 loops=1)
> Sort Key: identifier
> -> Bitmap Heap Scan on dba_data_base a (cost=134.43..14700.38 rows=3857
> width=62) (actual time=81.106..3207.721 rows=318 loops=1)
>Recheck Cond: ((lower(identifier) ~~ 'bugs001884677'::text) OR
> (lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~
> 'bugs001884679'::text) OR (lower(identifier) ~
> ~ 'sptigr4-2210 (6f24)'::text))
>Filter: ((lower(identifier) ~~ 'bugs001884677'::text) OR
> (lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~
> 'bugs001884679'::text) OR (lower(identifier) ~~ 'spt
> igr4-2210 (6f24)'::text))
>-> BitmapOr (cost=134.43..134.43 rows=3857 width=0) (actual time=
> 71.397..71.397 rows=0 loops=1)
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0
> loops=1)
>Index Cond: (lower(identifier) ~=~
> 'bugs001884677'::character varying)
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0
> loops=1)
>Index Cond: (lower(identifier) ~=~
> 'bugs001884678'::character varying)
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0
> loops=1)
>Index Cond: (lower(identifier) ~=~
> 'bugs001884679'::character varying)
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318
> loops=1)
>Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
> (6f24)'::character varying)
> Total runtime: 3208.904 ms
>
>
> Query 2
> ---
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio,
> log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> lower(identifier) LIKE lower('BUGS001884677')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> lower(identifier) LIKE lower('BUGS001884678')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> lower(identifier) LIKE lower('BUGS001884679')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
> ORDER BY identifier;
>
> Sort (cost=15702.26..15711.90 rows=3856 width=62) (actual
> time=3.688..3.886 rows=317 loops=1)
> Sort Key: identifier
> -> Unique (cost=15414.74..15472.58 rows=3856 width=62) (actual
> time=2.663..3.387 rows=317 loops=1)
>-> Sort (cost=15414.74..15424.38 rows=3856 width=62) (actual
> time=2.660..2.834 rows=318 loops=1)
> Sort Key: bioassay_id, identifier, ratio, log_ratio, p_value
> -> Append (cost=32.88..15185.06 rows=3856 width=62) (actual
> time=0.320..2.131 rows=318 loops=1)
>-> Bitmap Heap Scan on dba_data_base a
> (cost=32.88..3786.62 rows=964 width=62) (actual time=0.041..0.041 rows
Re: [SQL] bigint and unix time
On Tuesday, August 16, 2011 2:12:52 am Janiv Ratson wrote: > Hi Adrain and thank you, > Trac 0.12 uses microseconds as time value. > What do you suggest? extract(epoch ..) returns seconds which you are trying to compare to microseconds. The solution would be to divide your 'time' values by 1,000,000 to make them seconds. Like: SELECT ticket, "time"/100, author, field, oldvalue, newvalue FROM ticket_change where "time"/100 BETWEEN extract(epoch from (date 'now' - integer '30')) AND extract(epoch from (date 'now')); > > Thanks, > Janiv. > -- Adrian Klaver [email protected] -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] which is better: using OR clauses or UNION?
-Original Message- From: [email protected] [mailto:[email protected]] On Behalf Of adam_pgsql Sent: Tuesday, August 16, 2011 7:39 AM To: pgsql-sql Subject: [SQL] which is better: using OR clauses or UNION? Hi, I have a query hitting a table of 25 million rows. The table has a text field ('identifier') which i need to query for matching rows. The question is if i have multiple strings to match against this field I can use multiple OR sub-statements or multiple statements in a UNION. The UNION seems to run quicker is this to be expected? or is there anything else I can do improve the speed of this query? Some query details: You can also try the following form: ... WHERE column = ANY( string_to_array ( lower( 'Bug1,Bug2,Bug3' ) , ',' ) ); The main advantage of this is that you can parameterize the input string and so it will not logically matter how many values you are checking for. Also, you can always just place the values you want to search for into a table (temp or otherwise) and perform an Inner Join. No idea which one is "faster" but the "string_to_array" above requires no Dynamic SQL which all of the other forms (OR, UNION, IN) need. The table form also does not require dynamic SQL but you need additional INSERTS for each search value. I doubt it would be any better than the (OR/UNION/JOIN) form but you could also create a VALUES virtual table - which also requires Dynamic SQL. SELECT * FROM target_table NATURAL JOIN ( VALUES ('Bug1'),('Bug2'),('Bug3') ) searchtarget ( join_column_name ); David J. -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] which is better: using OR clauses or UNION?
adam_pgsql writes:
> I have a query hitting a table of 25 million rows. The table has a
> text field ('identifier') which i need to query for matching rows. The
> question is if i have multiple strings to match against this field I
> can use multiple OR sub-statements or multiple statements in a
> UNION. The UNION seems to run quicker is this to be expected?
Your test cases don't seem exactly comparable; in particular I think the
second one is benefiting from the first one having already read and
cached the relevant disk blocks. Notice how you've got, eg,
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318
> loops=1)
> Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
> (6f24)'::character varying)
versus
> -> Bitmap Index Scan on
> in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) (actual
> time=0.178..0.178 rows=318 loops=1)
> Index Cond: (lower(identifier) ~=~
> 'sptigr4-2210 (6f24)'::character varying)
Those are the exact same subplan, so any honest comparison should be
finding them to take the same amount of time. When the actual readings
are different by a factor of several hundred, there's something wrong
with your measurement process.
In the end this comes down to whether duplicates will be eliminated more
efficiently by a BitmapOr step or by sort/uniq on the resulting rows.
I'd have to bet on the BitmapOr myself, but it's likely that this is
down in the noise compared to the actual disk accesses in any
not-fully-cached scenario. Also, if you don't expect the sub-statements
to yield any duplicates, or don't care about seeing the same row twice
in the output, you should consider UNION ALL instead of UNION.
regards, tom lane
--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] which is better: using OR clauses or UNION?
On 2011-08-16, adam_pgsql wrote:
>
> Hi,
>
> I have a query hitting a table of 25 million rows. The table has a
>text field ('identifier') which i need to query for matching rows. The
>question is if i have multiple strings to match against this field I
>can use multiple OR sub-statements or multiple statements in a UNION.
>The UNION seems to run quicker is this to be expected? or is there
>anything else I can do improve the speed of this query? Some query
>details:
> WHERE
> ( lower(identifier) LIKE lower('BUGS001884677') OR
>lower(identifier) LIKE lower('BUGS001884678') OR
>lower(identifier) LIKE lower('BUGS001884679') OR
>lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
> ORDER BY a.identifier;
don't use like use regex.
...
WHERE identifier ~*
E'^(BUGS001884677|BUGS001884678|BUGS001884679|SpTIGR4-2210
\\(6F24\\))$'
or
where lower(identifier) ~*
lower(E'^(BUGS001884677|BUGS001884678|BUGS001884679|SpTIGR4-2210
\\(6F24\\))$')
on the other hand you aren't doing any pattern stuff. - you you could
just use 'in':
WHERE lower(identifier) in
(lower('BUGS001884677'),
lower('BUGS001884678'),
lower('BUGS001884679'),
lower('SpTIGR4-2210(6F24)')
)
or if you need like, use like any:
WHERE lower(identifier) like ANY
(lower('BUGS001884677'),
lower('BUGS001884678'),
lower('BUGS001884679'),
lower('SpTIGR4-2210(6F24)')
)
> Also which should scale better if I add more strings to match? would there be
> any better design patterns for this problem?
use one of the above: preferably in, else regex, or failing that like any.
"= any" will also work but I don't thing it will ever be better than "in"
--
⚂⚃ 100% natural
--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] which is better: using OR clauses or UNION?
Hi Viktor,
thanks for your email, gave that a try, but the
lower(identifier) LIKE lower('BUGS001884677') OR
still comes in quicker than the IN approach
thanks
adam
On 16 Aug 2011, at 12:56, Viktor Bojović wrote:
> hi Adam,
> im not sure which is faster/slower but, possibly you can speed it up by using
> "in" operator
>
> ...where lower(identifier) in (lower('BUGS001884677') ,
> lower('BUGS001884678'),);
>
> if you create function based index:
> CREATE INDEX idx_table_lower_text ON table(lower(text_field));
> (taken from: http://archives.postgresql.org/pgsql-sql/2003-09/msg00395.php)
>
>
>
>
> On Tue, Aug 16, 2011 at 1:39 PM, adam_pgsql wrote:
>
> Hi,
>
> I have a query hitting a table of 25 million rows. The table has a text field
> ('identifier') which i need to query for matching rows. The question is if i
> have multiple strings to match against this field I can use multiple OR
> sub-statements or multiple statements in a UNION. The UNION seems to run
> quicker is this to be expected? or is there anything else I can do
> improve the speed of this query? Some query details:
>
>
> table "dba_data_base", index:
> "in_dba_data_base_identifier" btree (lower(identifier) varchar_pattern_ops)
>
>
> Query 1
> ---
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio,
> p_value
> FROM
> dba_data_base a
> WHERE
> ( lower(identifier) LIKE lower('BUGS001884677') OR
> lower(identifier) LIKE lower('BUGS001884678') OR
> lower(identifier) LIKE lower('BUGS001884679') OR
> lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
> ORDER BY a.identifier;
>
>QUERY PLAN
>
>
>
> Sort (cost=14930.13..14939.77 rows=3857 width=62) (actual
> time=3208.466..3208.652 rows=318 loops=1)
> Sort Key: identifier
> -> Bitmap Heap Scan on dba_data_base a (cost=134.43..14700.38 rows=3857
> width=62) (actual time=81.106..3207.721 rows=318 loops=1)
>Recheck Cond: ((lower(identifier) ~~ 'bugs001884677'::text) OR
> (lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~
> 'bugs001884679'::text) OR (lower(identifier) ~
> ~ 'sptigr4-2210 (6f24)'::text))
>Filter: ((lower(identifier) ~~ 'bugs001884677'::text) OR
> (lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~
> 'bugs001884679'::text) OR (lower(identifier) ~~ 'spt
> igr4-2210 (6f24)'::text))
>-> BitmapOr (cost=134.43..134.43 rows=3857 width=0) (actual
> time=71.397..71.397 rows=0 loops=1)
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0 loops=1)
>Index Cond: (lower(identifier) ~=~
> 'bugs001884677'::character varying)
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
>Index Cond: (lower(identifier) ~=~
> 'bugs001884678'::character varying)
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
>Index Cond: (lower(identifier) ~=~
> 'bugs001884679'::character varying)
> -> Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318
> loops=1)
>Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
> (6f24)'::character varying)
> Total runtime: 3208.904 ms
>
>
> Query 2
> ---
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio,
> p_value
> FROM
> dba_data_base a
> WHERE
> lower(identifier) LIKE lower('BUGS001884677')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> lower(identifier) LIKE lower('BUGS001884678')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> lower(identifier) LIKE lower('BUGS001884679')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
> FROM
> dba_data_base a
> WHERE
> lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
> ORDER BY identifier;
>
> Sort (cost=15702.26..15711.90 rows=3856 width=62) (actual time=3.688..3.886
> rows=317 loops=1)
> Sort Key: identifier
> -> Unique (cost=15414.74..15472.58 rows=3856 width=62) (actual
> time=2.663..3.387 rows=317 loops=1)
>-> Sort (cost=15414.74..15424.38 rows=3856 width=62) (actual
> time=2.6
Re: [SQL] bigint and unix time
Hi Adrain and thank you, Trac 0.12 uses microseconds as time value. What do you suggest? Thanks, Janiv. -Original Message- From: Adrian Klaver [mailto:[email protected]] Sent: Monday, August 15, 2011 17:14 To: Janiv Ratson Cc: [email protected] Subject: Re: [SQL] bigint and unix time On Sunday, August 14, 2011 11:23:08 pm Janiv Ratson wrote: > Hi and thanks, > If my 'time' column is being saved as bigint: 128732389900. > How do I write a query to check if the 'time' field is greater than now - > 30 (past 30 days)? So what you want is not what values are greater than some point 30 days ago which is what your previous query asked and answered, but the values between a point 30 days ago and today. The easiest way is to use BETWEEN: test(5432)aklaver=>\d big_int_test Table "public.big_int_test" Column | Type | Modifiers +-+--- bint | bigint | rint | integer | test(5432)aklaver=>SELECT * from big_int_test ; bint |rint --+ 128732389900 | 1310799600 test(5432)aklaver=>SELECT bint FROM big_int_test WHERE bint BETWEEN extract(epoch from (date 'now' - integer '30')) AND extract(epoch from (date 'now')); bint -- (0 rows) That being said, if your time values are the order of magnitude shown they will not meet the criteria above. Is the time value supposed to be seconds? > > Thanks, > Janiv,. -- Adrian Klaver [email protected] -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] bigint and unix time
Hi and thanks again. I use the following query but it returns me 0 rows (it should return more): SELECT ticket, "time", author, field, oldvalue, newvalue FROM ticket_change where "time" BETWEEN extract(epoch from (date 'now' - integer '30')) AND extract(epoch from (date 'now')); Thanks a lot again, Janiv. -Original Message- From: Adrian Klaver [mailto:[email protected]] Sent: Monday, August 15, 2011 17:14 To: Janiv Ratson Cc: [email protected] Subject: Re: [SQL] bigint and unix time On Sunday, August 14, 2011 11:23:08 pm Janiv Ratson wrote: > Hi and thanks, > If my 'time' column is being saved as bigint: 128732389900. > How do I write a query to check if the 'time' field is greater than now - > 30 (past 30 days)? So what you want is not what values are greater than some point 30 days ago which is what your previous query asked and answered, but the values between a point 30 days ago and today. The easiest way is to use BETWEEN: test(5432)aklaver=>\d big_int_test Table "public.big_int_test" Column | Type | Modifiers +-+--- bint | bigint | rint | integer | test(5432)aklaver=>SELECT * from big_int_test ; bint |rint --+ 128732389900 | 1310799600 test(5432)aklaver=>SELECT bint FROM big_int_test WHERE bint BETWEEN extract(epoch from (date 'now' - integer '30')) AND extract(epoch from (date 'now')); bint -- (0 rows) That being said, if your time values are the order of magnitude shown they will not meet the criteria above. Is the time value supposed to be seconds? > > Thanks, > Janiv,. -- Adrian Klaver [email protected] -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] bigint and unix time
Additional information: If I use the following query: select "time", extract(epoch from (date 'now' - integer '30')), extract(epoch from (date 'now')) from ticket_change The first row looks like this: Bigint, double precision, double precision 128609061500;131085;1313442000 Thanks, Janiv, -Original Message- From: Adrian Klaver [mailto:[email protected]] Sent: Monday, August 15, 2011 17:14 To: Janiv Ratson Cc: [email protected] Subject: Re: [SQL] bigint and unix time On Sunday, August 14, 2011 11:23:08 pm Janiv Ratson wrote: > Hi and thanks, > If my 'time' column is being saved as bigint: 128732389900. > How do I write a query to check if the 'time' field is greater than now - > 30 (past 30 days)? So what you want is not what values are greater than some point 30 days ago which is what your previous query asked and answered, but the values between a point 30 days ago and today. The easiest way is to use BETWEEN: test(5432)aklaver=>\d big_int_test Table "public.big_int_test" Column | Type | Modifiers +-+--- bint | bigint | rint | integer | test(5432)aklaver=>SELECT * from big_int_test ; bint |rint --+ 128732389900 | 1310799600 test(5432)aklaver=>SELECT bint FROM big_int_test WHERE bint BETWEEN extract(epoch from (date 'now' - integer '30')) AND extract(epoch from (date 'now')); bint -- (0 rows) That being said, if your time values are the order of magnitude shown they will not meet the criteria above. Is the time value supposed to be seconds? > > Thanks, > Janiv,. -- Adrian Klaver [email protected] -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
[SQL] which is better: using OR clauses or UNION?
Hi,
I have a query hitting a table of 25 million rows. The table has a text field
('identifier') which i need to query for matching rows. The question is if i
have multiple strings to match against this field I can use multiple OR
sub-statements or multiple statements in a UNION. The UNION seems to run
quicker is this to be expected? or is there anything else I can do improve
the speed of this query? Some query details:
table "dba_data_base", index:
"in_dba_data_base_identifier" btree (lower(identifier) varchar_pattern_ops)
Query 1
---
datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio,
p_value
FROM
dba_data_base a
WHERE
( lower(identifier) LIKE lower('BUGS001884677') OR
lower(identifier) LIKE lower('BUGS001884678') OR
lower(identifier) LIKE lower('BUGS001884679') OR
lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
ORDER BY a.identifier;
QUERY PLAN
Sort (cost=14930.13..14939.77 rows=3857 width=62) (actual
time=3208.466..3208.652 rows=318 loops=1)
Sort Key: identifier
-> Bitmap Heap Scan on dba_data_base a (cost=134.43..14700.38 rows=3857
width=62) (actual time=81.106..3207.721 rows=318 loops=1)
Recheck Cond: ((lower(identifier) ~~ 'bugs001884677'::text) OR
(lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~
'bugs001884679'::text) OR (lower(identifier) ~
~ 'sptigr4-2210 (6f24)'::text))
Filter: ((lower(identifier) ~~ 'bugs001884677'::text) OR
(lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~
'bugs001884679'::text) OR (lower(identifier) ~~ 'spt
igr4-2210 (6f24)'::text))
-> BitmapOr (cost=134.43..134.43 rows=3857 width=0) (actual
time=71.397..71.397 rows=0 loops=1)
-> Bitmap Index Scan on in_dba_data_base_identifier
(cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0 loops=1)
Index Cond: (lower(identifier) ~=~
'bugs001884677'::character varying)
-> Bitmap Index Scan on in_dba_data_base_identifier
(cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (lower(identifier) ~=~
'bugs001884678'::character varying)
-> Bitmap Index Scan on in_dba_data_base_identifier
(cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (lower(identifier) ~=~
'bugs001884679'::character varying)
-> Bitmap Index Scan on in_dba_data_base_identifier
(cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318
loops=1)
Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
(6f24)'::character varying)
Total runtime: 3208.904 ms
Query 2
---
datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio,
p_value
FROM
dba_data_base a
WHERE
lower(identifier) LIKE lower('BUGS001884677')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
FROM
dba_data_base a
WHERE
lower(identifier) LIKE lower('BUGS001884678')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
FROM
dba_data_base a
WHERE
lower(identifier) LIKE lower('BUGS001884679')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
FROM
dba_data_base a
WHERE
lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
ORDER BY identifier;
Sort (cost=15702.26..15711.90 rows=3856 width=62) (actual time=3.688..3.886
rows=317 loops=1)
Sort Key: identifier
-> Unique (cost=15414.74..15472.58 rows=3856 width=62) (actual
time=2.663..3.387 rows=317 loops=1)
-> Sort (cost=15414.74..15424.38 rows=3856 width=62) (actual
time=2.660..2.834 rows=318 loops=1)
Sort Key: bioassay_id, identifier, ratio, log_ratio, p_value
-> Append (cost=32.88..15185.06 rows=3856 width=62) (actual
time=0.320..2.131 rows=318 loops=1)
-> Bitmap Heap Scan on dba_data_base a
(cost=32.88..3786.62 rows=964 width=62) (actual time=0.041..0.041 rows=0
loops=1)
Filter: (lower(identifier) ~~
'bugs001884677'::text)
-> Bitmap Index Scan on in_dba_data_base_identifier
(cost=0.00..32.64 rows=964 width=0) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: (lower(identifier) ~=~
'bugs001884677'::character varying)
Re: [SQL] which is better: using OR clauses or UNION?
On 16 Aug 2011, at 15:09, Tom Lane wrote:
> adam_pgsql writes:
>> I have a query hitting a table of 25 million rows. The table has a
>> text field ('identifier') which i need to query for matching rows. The
>> question is if i have multiple strings to match against this field I
>> can use multiple OR sub-statements or multiple statements in a
>> UNION. The UNION seems to run quicker is this to be expected?
>
> Your test cases don't seem exactly comparable; in particular I think the
> second one is benefiting from the first one having already read and
> cached the relevant disk blocks. Notice how you've got, eg,
>
>> -> Bitmap Index Scan on in_dba_data_base_identifier
>> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318
>> loops=1)
>>Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
>> (6f24)'::character varying)
>
> versus
>
>> -> Bitmap Index Scan on
>> in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) (actual
>> time=0.178..0.178 rows=318 loops=1)
>>Index Cond: (lower(identifier) ~=~
>> 'sptigr4-2210 (6f24)'::character varying)
>
> Those are the exact same subplan, so any honest comparison should be
> finding them to take the same amount of time. When the actual readings
> are different by a factor of several hundred, there's something wrong
> with your measurement process.
>
> In the end this comes down to whether duplicates will be eliminated more
> efficiently by a BitmapOr step or by sort/uniq on the resulting rows.
> I'd have to bet on the BitmapOr myself, but it's likely that this is
> down in the noise compared to the actual disk accesses in any
> not-fully-cached scenario. Also, if you don't expect the sub-statements
> to yield any duplicates, or don't care about seeing the same row twice
> in the output, you should consider UNION ALL instead of UNION.
Thanks guys, I'll give some of those options a try and see which ones improve
performance
(Tom, yes i ran those queries after each other so there was caching going on.
However, I had noticed a difference in performance when spacing the queries
before and after a few other big queries to help clear the cache).
adam
--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] which is better: using OR clauses or UNION?
> -Original Message- > From: adam_pgsql [mailto:[email protected]] > Sent: Tuesday, August 16, 2011 7:39 AM > To: pgsql-sql > Subject: which is better: using OR clauses or UNION? > > > Hi, > > I have a query hitting a table of 25 million rows. The table has a text > field ('identifier') which i need to query for matching rows. The > question is if i have multiple strings to match against this field I > can use multiple OR sub-statements or multiple statements in a UNION. > The UNION seems to run quicker is this to be expected? or is there > anything else I can do improve the speed of this query? Some query > details: > > > table "dba_data_base", index: > "in_dba_data_base_identifier" btree (lower(identifier) > varchar_pattern_ops) > > > Query 1 > --- > datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, > log_ratio, p_value > FROM > dba_data_base a > WHERE > ( lower(identifier) LIKE lower('BUGS001884677') OR >lower(identifier) LIKE lower('BUGS001884678') OR >lower(identifier) LIKE lower('BUGS001884679') OR >lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') ) > ORDER BY a.identifier; > > QUERY PLAN > > --- > --- > -- > > Sort (cost=14930.13..14939.77 rows=3857 width=62) (actual > time=3208.466..3208.652 rows=318 loops=1) > Sort Key: identifier > -> Bitmap Heap Scan on dba_data_base a (cost=134.43..14700.38 > rows=3857 width=62) (actual time=81.106..3207.721 rows=318 loops=1) > Recheck Cond: ((lower(identifier) ~~ 'bugs001884677'::text) > OR (lower(identifier) ~~ 'bugs001884678'::text) OR > (lower(identifier) ~~ 'bugs001884679'::text) OR (lower(identifier) > ~ > ~ 'sptigr4-2210 (6f24)'::text)) > Filter: ((lower(identifier) ~~ 'bugs001884677'::text) OR > (lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) > ~~ 'bugs001884679'::text) OR (lower(identifier) ~~ 'spt > igr4-2210 (6f24)'::text)) > -> BitmapOr (cost=134.43..134.43 rows=3857 width=0) (actual > time=71.397..71.397 rows=0 loops=1) > -> Bitmap Index Scan on in_dba_data_base_identifier > (cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0 > loops=1) > Index Cond: (lower(identifier) ~=~ > 'bugs001884677'::character varying) > -> Bitmap Index Scan on in_dba_data_base_identifier > (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 > loops=1) > Index Cond: (lower(identifier) ~=~ > 'bugs001884678'::character varying) > -> Bitmap Index Scan on in_dba_data_base_identifier > (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 > loops=1) > Index Cond: (lower(identifier) ~=~ > 'bugs001884679'::character varying) > -> Bitmap Index Scan on in_dba_data_base_identifier > (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 > rows=318 loops=1) > Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 > (6f24)'::character varying) > Total runtime: 3208.904 ms > > > Query 2 > --- > datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, > log_ratio, p_value > FROM > dba_data_base a > WHERE > lower(identifier) LIKE lower('BUGS001884677') > UNION > SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value > FROM > dba_data_base a > WHERE >lower(identifier) LIKE lower('BUGS001884678') > UNION > SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value > FROM > dba_data_base a > WHERE >lower(identifier) LIKE lower('BUGS001884679') > UNION > SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value > FROM > dba_data_base a > WHERE >lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') > ORDER BY identifier; > > Sort (cost=15702.26..15711.90 rows=3856 width=62) (actual > time=3.688..3.886 rows=317 loops=1) > Sort Key: identifier > -> Unique (cost=15414.74..15472.58 rows=3856 width=62) (actual > time=2.663..3.387 rows=317 loops=1) > -> Sort (cost=15414.74..15424.38 rows=3856 width=62) (actual > time=2.660..2.834 rows=318 loops=1) > Sort Key: bioassay_id, identifier, ratio, log_ratio, > p_value > -> Append (cost=32.88..15185.06 rows=3856 width=62) > (actual time=0.320..2.131 rows=318 loops=1) > -> Bitmap Heap Scan on dba_data_base a > (cost=32.88..3786.62 rows=964 width=62) (actual time=0.041..0.041 > rows=0 loops=1) > Filter: (lower(identifier) ~~ > 'bugs001884677'::text) > -> Bitmap Index Scan on > in_dba_data_base_identifier (cost=0.00..32.64 rows=964 width=0) > (a
[SQL] parsing audit table
Hi everyone, I'm a bit lazy, or actually in a bit of a crunch. I added an audit recording a few months ago, but never really used it much, but today I'm seeing a bunch of suspicious activity by one user. Does someone have any function to quickly parse this data? I followed this: http://wiki.postgresql.org/wiki/Audit_trigger - so I'm hoping someone else has something to parse that. original data: "(B04TaEsAAIG5bEEX5xBVPQ,2,jakew,mag,"",2011-07-05,2011-07-05,,,7,528284,"",2011-07-05,13:20:59,2011-07-05,"",A04CQUUABSxYfxftPQqJlg,,JUfhSzwADKqAFSN1Cbv+mg,BUiZqlIABburW7jqdY9JJQ,HUZPx0gACfCxy1Y34QSTQw,,,0,N,Y,Y,919.9100,0.,0.,2011-07-05,2011-07-05)" new data: "(B04TaEsAAIG5bEEX5xBVPQ,3,jakew,mag,"",2011-07-05,2011-07-05,,,7,528284,"",2011-07-05,13:20:59,2011-07-05,"",A04CQUUABSxYfxftPQqJlg,,JUfhSzwADKqAFSN1Cbv+mg,BUiZqlIABburW7jqdY9JJQ,HUZPx0gACfCxy1Y34QSTQw,,,0,N,N,Y,919.9100,919.9100,0.,,2011-08-04)" those 22 alphanumeric columns are IDs. Thanks for any help, Mark -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
Re: [SQL] parsing audit table
On Tue, Aug 16, 2011 at 4:02 PM, M. D. wrote: > Hi everyone, > > I'm a bit lazy, or actually in a bit of a crunch. I added an audit > recording a few months ago, but never really used it much, but today I'm > seeing a bunch of suspicious activity by one user. Does someone have any > function to quickly parse this data? > that's not exactly auditable information :D i made this one for that: https://github.com/jcasanov/pg_audit it has two versions one using hstore in which you will see in old "column=>old_data" and in new "column=>new_data" seems better and i guess you can use hstore functions on it (http://www.postgresql.org/docs/9.0/static/hstore.html) The other one uses arrays to store column names, old values, new values, still more parseable -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-sql mailing list ([email protected]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-sql
