date:20110816

[SQL] which is better: using OR clauses or UNION?

2011-08-16 Thread adam_pgsql


Hi,

I have a query hitting a table of 25 million rows. The table has a text field 
('identifier') which i need to query for matching rows. The question is if i 
have multiple strings to match against this field I can use multiple OR 
sub-statements or multiple statements in a UNION. The UNION seems to run 
quicker is this to be expected? or is there anything else I can do improve 
the speed of this query? Some query details:


table "dba_data_base", index:
"in_dba_data_base_identifier" btree (lower(identifier) varchar_pattern_ops)


Query 1
---
datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio, 
p_value
 FROM
 dba_data_base a
 WHERE
 ( lower(identifier) LIKE lower('BUGS001884677') OR
   lower(identifier) LIKE lower('BUGS001884678') OR
   lower(identifier) LIKE lower('BUGS001884679') OR
   lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
 ORDER BY a.identifier;

  QUERY PLAN
   



Sort  (cost=14930.13..14939.77 rows=3857 width=62) (actual 
time=3208.466..3208.652 rows=318 loops=1)
  Sort Key: identifier
  ->  Bitmap Heap Scan on dba_data_base a  (cost=134.43..14700.38 rows=3857 
width=62) (actual time=81.106..3207.721 rows=318 loops=1)
Recheck Cond: ((lower(identifier) ~~ 'bugs001884677'::text) OR 
(lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~ 
'bugs001884679'::text) OR (lower(identifier) ~
~ 'sptigr4-2210 (6f24)'::text))
Filter: ((lower(identifier) ~~ 'bugs001884677'::text) OR 
(lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~ 
'bugs001884679'::text) OR (lower(identifier) ~~ 'spt
igr4-2210 (6f24)'::text))
->  BitmapOr  (cost=134.43..134.43 rows=3857 width=0) (actual 
time=71.397..71.397 rows=0 loops=1)
  ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0 loops=1)
Index Cond: (lower(identifier) ~=~ 
'bugs001884677'::character varying)
  ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (lower(identifier) ~=~ 
'bugs001884678'::character varying)
  ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (lower(identifier) ~=~ 
'bugs001884679'::character varying)
  ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318 
loops=1)
Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 
(6f24)'::character varying)
Total runtime: 3208.904 ms


Query 2
---
datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio, 
p_value
 FROM
 dba_data_base a
 WHERE
 lower(identifier) LIKE lower('BUGS001884677')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
 FROM
 dba_data_base a
 WHERE
   lower(identifier) LIKE lower('BUGS001884678')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
 FROM
 dba_data_base a
 WHERE
   lower(identifier) LIKE lower('BUGS001884679')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
 FROM
 dba_data_base a
 WHERE
   lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
 ORDER BY identifier;

Sort  (cost=15702.26..15711.90 rows=3856 width=62) (actual time=3.688..3.886 
rows=317 loops=1)
  Sort Key: identifier
  ->  Unique  (cost=15414.74..15472.58 rows=3856 width=62) (actual 
time=2.663..3.387 rows=317 loops=1)
->  Sort  (cost=15414.74..15424.38 rows=3856 width=62) (actual 
time=2.660..2.834 rows=318 loops=1)
  Sort Key: bioassay_id, identifier, ratio, log_ratio, p_value
  ->  Append  (cost=32.88..15185.06 rows=3856 width=62) (actual 
time=0.320..2.131 rows=318 loops=1)
->  Bitmap Heap Scan on dba_data_base a  
(cost=32.88..3786.62 rows=964 width=62) (actual time=0.041..0.041 rows=0 
loops=1)
  Filter: (lower(identifier) ~~ 
'bugs001884677'::text)
  ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: (lower(identifier) ~=~ 
'bugs001884677'::character varying)
->  Bitmap Heap Scan on dba_data_base a  
(cost=3

Re: [SQL] which is better: using OR clauses or UNION?

2011-08-16 Thread Viktor Bojović

hi Adam,
im not sure which is faster/slower but, possibly you can speed it up by
using "in" operator

...where lower(identifier) *in *(lower('BUGS001884677')*
,*lower('BUGS001884678')
*,*);

if you create function based index:

CREATE INDEX idx_table_lower_text ON table(*lower*(text_field));

(taken from: http://archives.postgresql.org/pgsql-sql/2003-09/msg00395.php)




On Tue, Aug 16, 2011 at 1:39 PM, adam_pgsql wrote:

>
> Hi,
>
> I have a query hitting a table of 25 million rows. The table has a text
> field ('identifier') which i need to query for matching rows. The question
> is if i have multiple strings to match against this field I can use multiple
> OR sub-statements or multiple statements in a UNION. The UNION seems to run
> quicker is this to be expected? or is there anything else I can do
> improve the speed of this query? Some query details:
>
>
> table "dba_data_base", index:
> "in_dba_data_base_identifier" btree (lower(identifier) varchar_pattern_ops)
>
>
> Query 1
> ---
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio,
> log_ratio, p_value
>  FROM
> dba_data_base a
>  WHERE
> ( lower(identifier) LIKE lower('BUGS001884677') OR
>   lower(identifier) LIKE lower('BUGS001884678') OR
>   lower(identifier) LIKE lower('BUGS001884679') OR
>   lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
>  ORDER BY a.identifier;
>
>  QUERY PLAN
>
>
> 
> 
> Sort  (cost=14930.13..14939.77 rows=3857 width=62) (actual
> time=3208.466..3208.652 rows=318 loops=1)
>  Sort Key: identifier
>  ->  Bitmap Heap Scan on dba_data_base a  (cost=134.43..14700.38 rows=3857
> width=62) (actual time=81.106..3207.721 rows=318 loops=1)
>Recheck Cond: ((lower(identifier) ~~ 'bugs001884677'::text) OR
> (lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~
> 'bugs001884679'::text) OR (lower(identifier) ~
> ~ 'sptigr4-2210 (6f24)'::text))
>Filter: ((lower(identifier) ~~ 'bugs001884677'::text) OR
> (lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~
> 'bugs001884679'::text) OR (lower(identifier) ~~ 'spt
> igr4-2210 (6f24)'::text))
>->  BitmapOr  (cost=134.43..134.43 rows=3857 width=0) (actual time=
> 71.397..71.397 rows=0 loops=1)
>  ->  Bitmap Index Scan on in_dba_data_base_identifier
>  (cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0
> loops=1)
>Index Cond: (lower(identifier) ~=~
> 'bugs001884677'::character varying)
>  ->  Bitmap Index Scan on in_dba_data_base_identifier
>  (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0
> loops=1)
>Index Cond: (lower(identifier) ~=~
> 'bugs001884678'::character varying)
>  ->  Bitmap Index Scan on in_dba_data_base_identifier
>  (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0
> loops=1)
>Index Cond: (lower(identifier) ~=~
> 'bugs001884679'::character varying)
>  ->  Bitmap Index Scan on in_dba_data_base_identifier
>  (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318
> loops=1)
>Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
> (6f24)'::character varying)
> Total runtime: 3208.904 ms
>
>
> Query 2
> ---
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio,
> log_ratio, p_value
>  FROM
> dba_data_base a
>  WHERE
> lower(identifier) LIKE lower('BUGS001884677')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
> dba_data_base a
>  WHERE
>   lower(identifier) LIKE lower('BUGS001884678')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
> dba_data_base a
>  WHERE
>   lower(identifier) LIKE lower('BUGS001884679')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
> dba_data_base a
>  WHERE
>   lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
>  ORDER BY identifier;
>
> Sort  (cost=15702.26..15711.90 rows=3856 width=62) (actual
> time=3.688..3.886 rows=317 loops=1)
>  Sort Key: identifier
>  ->  Unique  (cost=15414.74..15472.58 rows=3856 width=62) (actual
> time=2.663..3.387 rows=317 loops=1)
>->  Sort  (cost=15414.74..15424.38 rows=3856 width=62) (actual
> time=2.660..2.834 rows=318 loops=1)
>  Sort Key: bioassay_id, identifier, ratio, log_ratio, p_value
>  ->  Append  (cost=32.88..15185.06 rows=3856 width=62) (actual
> time=0.320..2.131 rows=318 loops=1)
>->  Bitmap Heap Scan on dba_data_base a
>  (cost=32.88..3786.62 rows=964 width=62) (actual time=0.041..0.041 rows

Re: [SQL] bigint and unix time

2011-08-16 Thread Adrian Klaver

On Tuesday, August 16, 2011 2:12:52 am Janiv Ratson wrote:
> Hi Adrain and thank you,
> Trac 0.12 uses microseconds as time value.
> What do you suggest?

extract(epoch ..) returns seconds which you are trying to compare to 
microseconds. The solution would be to divide your 'time' values by 1,000,000 
to 
make them seconds. Like:

SELECT ticket, "time"/100, author, field, oldvalue, newvalue
  FROM ticket_change
  where "time"/100 
  BETWEEN 
extract(epoch from (date 'now' - integer '30')) AND
extract(epoch from (date 'now'));


> 
> Thanks,
> Janiv.
> 

-- 
Adrian Klaver
[email protected]

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] which is better: using OR clauses or UNION?

2011-08-16 Thread David Johnston

-Original Message-
From: [email protected] [mailto:[email protected]]
On Behalf Of adam_pgsql
Sent: Tuesday, August 16, 2011 7:39 AM
To: pgsql-sql
Subject: [SQL] which is better: using OR clauses or UNION?

Hi,

I have a query hitting a table of 25 million rows. The table has a text
field ('identifier') which i need to query for matching rows. The question
is if i have multiple strings to match against this field I can use multiple
OR sub-statements or multiple statements in a UNION. The UNION seems to run
quicker is this to be expected? or is there anything else I can do
improve the speed of this query? Some query details:

You can also try the following form:

... WHERE column = ANY(  string_to_array ( lower( 'Bug1,Bug2,Bug3' ) , ',' )
);

The main advantage of this is that you can parameterize the input string and
so it will not logically matter how many values you are checking for.

Also, you can always just place the values you want to search for into a
table (temp or otherwise) and perform an Inner Join.

No idea which one is "faster" but the "string_to_array" above requires no
Dynamic SQL which all of the other forms (OR, UNION, IN) need.  The table
form also does not require dynamic SQL but you need additional INSERTS for
each search value.  I doubt it would be any better than the (OR/UNION/JOIN)
form but you could also create a VALUES virtual table - which also requires
Dynamic SQL.

SELECT *
FROM target_table
NATURAL JOIN ( VALUES ('Bug1'),('Bug2'),('Bug3') ) searchtarget (
join_column_name );

David J.

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] which is better: using OR clauses or UNION?

2011-08-16 Thread Tom Lane

adam_pgsql  writes:
> I have a query hitting a table of 25 million rows. The table has a
> text field ('identifier') which i need to query for matching rows. The
> question is if i have multiple strings to match against this field I
> can use multiple OR sub-statements or multiple statements in a
> UNION. The UNION seems to run quicker is this to be expected?

Your test cases don't seem exactly comparable; in particular I think the
second one is benefiting from the first one having already read and
cached the relevant disk blocks.  Notice how you've got, eg,

>   ->  Bitmap Index Scan on in_dba_data_base_identifier  
> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318 
> loops=1)
> Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 
> (6f24)'::character varying)

versus

>   ->  Bitmap Index Scan on 
> in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0) (actual 
> time=0.178..0.178 rows=318 loops=1)
> Index Cond: (lower(identifier) ~=~ 
> 'sptigr4-2210 (6f24)'::character varying)

Those are the exact same subplan, so any honest comparison should be
finding them to take the same amount of time.  When the actual readings
are different by a factor of several hundred, there's something wrong
with your measurement process.

In the end this comes down to whether duplicates will be eliminated more
efficiently by a BitmapOr step or by sort/uniq on the resulting rows.
I'd have to bet on the BitmapOr myself, but it's likely that this is
down in the noise compared to the actual disk accesses in any
not-fully-cached scenario.  Also, if you don't expect the sub-statements
to yield any duplicates, or don't care about seeing the same row twice
in the output, you should consider UNION ALL instead of UNION.

regards, tom lane

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] which is better: using OR clauses or UNION?

2011-08-16 Thread Jasen Betts

On 2011-08-16, adam_pgsql  wrote:
>
> Hi,
>
> I have a query hitting a table of 25 million rows. The table has a
>text field ('identifier') which i need to query for matching rows. The
>question is if i have multiple strings to match against this field I
>can use multiple OR sub-statements or multiple statements in a UNION.
>The UNION seems to run quicker is this to be expected? or is there
>anything else I can do improve the speed of this query? Some query
>details:

>  WHERE
>  ( lower(identifier) LIKE lower('BUGS001884677') OR
>lower(identifier) LIKE lower('BUGS001884678') OR
>lower(identifier) LIKE lower('BUGS001884679') OR
>lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
>  ORDER BY a.identifier;

don't use like use regex.

...

WHERE identifier ~* 
E'^(BUGS001884677|BUGS001884678|BUGS001884679|SpTIGR4-2210
\\(6F24\\))$'

or 

where lower(identifier) ~* 
lower(E'^(BUGS001884677|BUGS001884678|BUGS001884679|SpTIGR4-2210
\\(6F24\\))$')

on the other hand you aren't doing any pattern stuff. - you you could
just use 'in':

WHERE lower(identifier) in
  (lower('BUGS001884677'),
   lower('BUGS001884678'),
   lower('BUGS001884679'),
   lower('SpTIGR4-2210(6F24)') 
  )

or if you need like, use like any:

WHERE lower(identifier) like ANY
  (lower('BUGS001884677'),
   lower('BUGS001884678'),
   lower('BUGS001884679'),
   lower('SpTIGR4-2210(6F24)') 
  )


> Also which should scale better if I add more strings to match? would there be 
> any better design patterns for this problem?

use one of the above: preferably in, else regex, or failing that like any.
"= any" will also work but I don't thing it will ever be better than "in"

-- 
⚂⚃ 100% natural


-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] which is better: using OR clauses or UNION?

2011-08-16 Thread adam_pgsql


Hi Viktor,

thanks for your email, gave that a try, but the 

lower(identifier) LIKE lower('BUGS001884677') OR

still comes in quicker than the IN approach

thanks

adam



On 16 Aug 2011, at 12:56, Viktor Bojović wrote:

> hi Adam,
> im not sure which is faster/slower but, possibly you can speed it up by using 
> "in" operator
> 
> ...where lower(identifier) in (lower('BUGS001884677') , 
> lower('BUGS001884678'),);
> 
> if you create function based index:
> CREATE INDEX idx_table_lower_text ON table(lower(text_field));
> (taken from: http://archives.postgresql.org/pgsql-sql/2003-09/msg00395.php)
> 
> 
> 
> 
> On Tue, Aug 16, 2011 at 1:39 PM, adam_pgsql  wrote:
> 
> Hi,
> 
> I have a query hitting a table of 25 million rows. The table has a text field 
> ('identifier') which i need to query for matching rows. The question is if i 
> have multiple strings to match against this field I can use multiple OR 
> sub-statements or multiple statements in a UNION. The UNION seems to run 
> quicker is this to be expected? or is there anything else I can do 
> improve the speed of this query? Some query details:
> 
> 
> table "dba_data_base", index:
> "in_dba_data_base_identifier" btree (lower(identifier) varchar_pattern_ops)
> 
> 
> Query 1
> ---
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio, 
> p_value
>  FROM
> dba_data_base a
>  WHERE
> ( lower(identifier) LIKE lower('BUGS001884677') OR
>   lower(identifier) LIKE lower('BUGS001884678') OR
>   lower(identifier) LIKE lower('BUGS001884679') OR
>   lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
>  ORDER BY a.identifier;
>   
>QUERY PLAN
> 
> 
> 
> Sort  (cost=14930.13..14939.77 rows=3857 width=62) (actual 
> time=3208.466..3208.652 rows=318 loops=1)
>  Sort Key: identifier
>  ->  Bitmap Heap Scan on dba_data_base a  (cost=134.43..14700.38 rows=3857 
> width=62) (actual time=81.106..3207.721 rows=318 loops=1)
>Recheck Cond: ((lower(identifier) ~~ 'bugs001884677'::text) OR 
> (lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~ 
> 'bugs001884679'::text) OR (lower(identifier) ~
> ~ 'sptigr4-2210 (6f24)'::text))
>Filter: ((lower(identifier) ~~ 'bugs001884677'::text) OR 
> (lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~ 
> 'bugs001884679'::text) OR (lower(identifier) ~~ 'spt
> igr4-2210 (6f24)'::text))
>->  BitmapOr  (cost=134.43..134.43 rows=3857 width=0) (actual 
> time=71.397..71.397 rows=0 loops=1)
>  ->  Bitmap Index Scan on in_dba_data_base_identifier  
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0 loops=1)
>Index Cond: (lower(identifier) ~=~ 
> 'bugs001884677'::character varying)
>  ->  Bitmap Index Scan on in_dba_data_base_identifier  
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
>Index Cond: (lower(identifier) ~=~ 
> 'bugs001884678'::character varying)
>  ->  Bitmap Index Scan on in_dba_data_base_identifier  
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
>Index Cond: (lower(identifier) ~=~ 
> 'bugs001884679'::character varying)
>  ->  Bitmap Index Scan on in_dba_data_base_identifier  
> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318 
> loops=1)
>Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 
> (6f24)'::character varying)
> Total runtime: 3208.904 ms
> 
> 
> Query 2
> ---
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio, 
> p_value
>  FROM
> dba_data_base a
>  WHERE
> lower(identifier) LIKE lower('BUGS001884677')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
> dba_data_base a
>  WHERE
>   lower(identifier) LIKE lower('BUGS001884678')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
> dba_data_base a
>  WHERE
>   lower(identifier) LIKE lower('BUGS001884679')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
> dba_data_base a
>  WHERE
>   lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
>  ORDER BY identifier;
> 
> Sort  (cost=15702.26..15711.90 rows=3856 width=62) (actual time=3.688..3.886 
> rows=317 loops=1)
>  Sort Key: identifier
>  ->  Unique  (cost=15414.74..15472.58 rows=3856 width=62) (actual 
> time=2.663..3.387 rows=317 loops=1)
>->  Sort  (cost=15414.74..15424.38 rows=3856 width=62) (actual 
> time=2.6

Re: [SQL] bigint and unix time

2011-08-16 Thread Janiv Ratson

Hi Adrain and thank you,
Trac 0.12 uses microseconds as time value.
What do you suggest?

Thanks,
Janiv.

-Original Message-
From: Adrian Klaver [mailto:[email protected]] 
Sent: Monday, August 15, 2011 17:14
To: Janiv Ratson
Cc: [email protected]
Subject: Re: [SQL] bigint and unix time

On Sunday, August 14, 2011 11:23:08 pm Janiv Ratson wrote:
> Hi and thanks,
> If my 'time' column is being saved as bigint: 128732389900.
> How do I write a query to check if the 'time' field is greater than now -
> 30 (past 30 days)?

So what you want is not what values are greater than some point 30 days ago 
which is what your previous query asked and answered, but the values between a 
point 30 days ago and today.  The easiest way is to use BETWEEN:

test(5432)aklaver=>\d big_int_test 

 Table "public.big_int_test"
 Column |  Type   | Modifiers 
+-+---
 bint   | bigint  | 
 rint   | integer | 

test(5432)aklaver=>SELECT * from big_int_test ;
   bint   |rint
--+
 128732389900 | 1310799600

test(5432)aklaver=>SELECT 
bint 
FROM 
big_int_test 
WHERE 
bint 
BETWEEN 
extract(epoch from (date 'now' - integer '30')) 
AND
extract(epoch from (date 'now'));

 bint 
--
(0 rows)

That being said, if your time values are the order of magnitude shown they will 
not meet the criteria above. Is the time value supposed to be seconds?

> 
> Thanks,
> Janiv,.

-- 
Adrian Klaver
[email protected]

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] bigint and unix time

2011-08-16 Thread Janiv Ratson

Hi and thanks again.
I use the following query but it returns me 0 rows (it should return more):

SELECT ticket, "time", author, field, oldvalue, newvalue
  FROM ticket_change
  where "time" 
  BETWEEN 
extract(epoch from (date 'now' - integer '30')) AND
extract(epoch from (date 'now'));

Thanks a lot again,
Janiv.

-Original Message-
From: Adrian Klaver [mailto:[email protected]] 
Sent: Monday, August 15, 2011 17:14
To: Janiv Ratson
Cc: [email protected]
Subject: Re: [SQL] bigint and unix time

On Sunday, August 14, 2011 11:23:08 pm Janiv Ratson wrote:
> Hi and thanks,
> If my 'time' column is being saved as bigint: 128732389900.
> How do I write a query to check if the 'time' field is greater than now -
> 30 (past 30 days)?

So what you want is not what values are greater than some point 30 days ago 
which is what your previous query asked and answered, but the values between a 
point 30 days ago and today.  The easiest way is to use BETWEEN:

test(5432)aklaver=>\d big_int_test 

 Table "public.big_int_test"
 Column |  Type   | Modifiers 
+-+---
 bint   | bigint  | 
 rint   | integer | 

test(5432)aklaver=>SELECT * from big_int_test ;
   bint   |rint
--+
 128732389900 | 1310799600

test(5432)aklaver=>SELECT 
bint 
FROM 
big_int_test 
WHERE 
bint 
BETWEEN 
extract(epoch from (date 'now' - integer '30')) 
AND
extract(epoch from (date 'now'));

 bint 
--
(0 rows)


That being said, if your time values are the order of magnitude shown they will 
not meet the criteria above. Is the time value supposed to be seconds?

> 
> Thanks,
> Janiv,.


-- 
Adrian Klaver
[email protected]



-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] bigint and unix time

2011-08-16 Thread Janiv Ratson

Additional information:

If I use the following query:

select "time", extract(epoch from (date 'now' - integer '30')), extract(epoch 
from (date 'now'))
from ticket_change

The first row looks like this:
Bigint, double precision, double precision
128609061500;131085;1313442000

Thanks,
Janiv,

-Original Message-
From: Adrian Klaver [mailto:[email protected]] 
Sent: Monday, August 15, 2011 17:14
To: Janiv Ratson
Cc: [email protected]
Subject: Re: [SQL] bigint and unix time

On Sunday, August 14, 2011 11:23:08 pm Janiv Ratson wrote:
> Hi and thanks,
> If my 'time' column is being saved as bigint: 128732389900.
> How do I write a query to check if the 'time' field is greater than now -
> 30 (past 30 days)?

So what you want is not what values are greater than some point 30 days ago 
which is what your previous query asked and answered, but the values between a 
point 30 days ago and today.  The easiest way is to use BETWEEN:

test(5432)aklaver=>\d big_int_test 

 Table "public.big_int_test"
 Column |  Type   | Modifiers 
+-+---
 bint   | bigint  | 
 rint   | integer | 

test(5432)aklaver=>SELECT * from big_int_test ;
   bint   |rint
--+
 128732389900 | 1310799600

test(5432)aklaver=>SELECT 
bint 
FROM 
big_int_test 
WHERE 
bint 
BETWEEN 
extract(epoch from (date 'now' - integer '30')) 
AND
extract(epoch from (date 'now'));

 bint 
--
(0 rows)


That being said, if your time values are the order of magnitude shown they will 
not meet the criteria above. Is the time value supposed to be seconds?

> 
> Thanks,
> Janiv,.


-- 
Adrian Klaver
[email protected]



-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

[SQL] which is better: using OR clauses or UNION?

2011-08-16 Thread Adam Witney


Hi,

I have a query hitting a table of 25 million rows. The table has a text field 
('identifier') which i need to query for matching rows. The question is if i 
have multiple strings to match against this field I can use multiple OR 
sub-statements or multiple statements in a UNION. The UNION seems to run 
quicker is this to be expected? or is there anything else I can do improve 
the speed of this query? Some query details:


table "dba_data_base", index:
"in_dba_data_base_identifier" btree (lower(identifier) varchar_pattern_ops)


Query 1
---
datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio, 
p_value
  FROM
  dba_data_base a
  WHERE
  ( lower(identifier) LIKE lower('BUGS001884677') OR
lower(identifier) LIKE lower('BUGS001884678') OR
lower(identifier) LIKE lower('BUGS001884679') OR
lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
  ORDER BY a.identifier;

   QUERY PLAN   




 Sort  (cost=14930.13..14939.77 rows=3857 width=62) (actual 
time=3208.466..3208.652 rows=318 loops=1)
   Sort Key: identifier
   ->  Bitmap Heap Scan on dba_data_base a  (cost=134.43..14700.38 rows=3857 
width=62) (actual time=81.106..3207.721 rows=318 loops=1)
 Recheck Cond: ((lower(identifier) ~~ 'bugs001884677'::text) OR 
(lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~ 
'bugs001884679'::text) OR (lower(identifier) ~
~ 'sptigr4-2210 (6f24)'::text))
 Filter: ((lower(identifier) ~~ 'bugs001884677'::text) OR 
(lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier) ~~ 
'bugs001884679'::text) OR (lower(identifier) ~~ 'spt
igr4-2210 (6f24)'::text))
 ->  BitmapOr  (cost=134.43..134.43 rows=3857 width=0) (actual 
time=71.397..71.397 rows=0 loops=1)
   ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0 loops=1)
 Index Cond: (lower(identifier) ~=~ 
'bugs001884677'::character varying)
   ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
 Index Cond: (lower(identifier) ~=~ 
'bugs001884678'::character varying)
   ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0 loops=1)
 Index Cond: (lower(identifier) ~=~ 
'bugs001884679'::character varying)
   ->  Bitmap Index Scan on in_dba_data_base_identifier  
(cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318 
loops=1)
 Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 
(6f24)'::character varying)
 Total runtime: 3208.904 ms


Query 2
---
datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio, log_ratio, 
p_value
  FROM
  dba_data_base a
  WHERE
  lower(identifier) LIKE lower('BUGS001884677')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
  FROM
  dba_data_base a
  WHERE
lower(identifier) LIKE lower('BUGS001884678')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
  FROM
  dba_data_base a
  WHERE
lower(identifier) LIKE lower('BUGS001884679')
UNION
SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
  FROM
  dba_data_base a
  WHERE
lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
  ORDER BY identifier;

 Sort  (cost=15702.26..15711.90 rows=3856 width=62) (actual time=3.688..3.886 
rows=317 loops=1)
   Sort Key: identifier
   ->  Unique  (cost=15414.74..15472.58 rows=3856 width=62) (actual 
time=2.663..3.387 rows=317 loops=1)
 ->  Sort  (cost=15414.74..15424.38 rows=3856 width=62) (actual 
time=2.660..2.834 rows=318 loops=1)
   Sort Key: bioassay_id, identifier, ratio, log_ratio, p_value
   ->  Append  (cost=32.88..15185.06 rows=3856 width=62) (actual 
time=0.320..2.131 rows=318 loops=1)
 ->  Bitmap Heap Scan on dba_data_base a  
(cost=32.88..3786.62 rows=964 width=62) (actual time=0.041..0.041 rows=0 
loops=1)
   Filter: (lower(identifier) ~~ 
'bugs001884677'::text)
   ->  Bitmap Index Scan on in_dba_data_base_identifier 
 (cost=0.00..32.64 rows=964 width=0) (actual time=0.036..0.036 rows=0 loops=1)
 Index Cond: (lower(identifier) ~=~ 
'bugs001884677'::character varying)

Re: [SQL] which is better: using OR clauses or UNION?

2011-08-16 Thread adam_pgsql


On 16 Aug 2011, at 15:09, Tom Lane wrote:

> adam_pgsql  writes:
>> I have a query hitting a table of 25 million rows. The table has a
>> text field ('identifier') which i need to query for matching rows. The
>> question is if i have multiple strings to match against this field I
>> can use multiple OR sub-statements or multiple statements in a
>> UNION. The UNION seems to run quicker is this to be expected?
> 
> Your test cases don't seem exactly comparable; in particular I think the
> second one is benefiting from the first one having already read and
> cached the relevant disk blocks.  Notice how you've got, eg,
> 
>>  ->  Bitmap Index Scan on in_dba_data_base_identifier  
>> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347 rows=318 
>> loops=1)
>>Index Cond: (lower(identifier) ~=~ 'sptigr4-2210 
>> (6f24)'::character varying)
> 
> versus
> 
>>  ->  Bitmap Index Scan on 
>> in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0) (actual 
>> time=0.178..0.178 rows=318 loops=1)
>>Index Cond: (lower(identifier) ~=~ 
>> 'sptigr4-2210 (6f24)'::character varying)
> 
> Those are the exact same subplan, so any honest comparison should be
> finding them to take the same amount of time.  When the actual readings
> are different by a factor of several hundred, there's something wrong
> with your measurement process.
> 
> In the end this comes down to whether duplicates will be eliminated more
> efficiently by a BitmapOr step or by sort/uniq on the resulting rows.
> I'd have to bet on the BitmapOr myself, but it's likely that this is
> down in the noise compared to the actual disk accesses in any
> not-fully-cached scenario.  Also, if you don't expect the sub-statements
> to yield any duplicates, or don't care about seeing the same row twice
> in the output, you should consider UNION ALL instead of UNION.


Thanks guys, I'll give some of those options a try and see which ones improve 
performance

(Tom, yes i ran those queries after each other so there was caching going on. 
However, I had noticed a difference in performance when spacing the queries 
before and after a few other big queries to help clear the cache).

adam




-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] which is better: using OR clauses or UNION?

2011-08-16 Thread Igor Neyman

> -Original Message-
> From: adam_pgsql [mailto:[email protected]]
> Sent: Tuesday, August 16, 2011 7:39 AM
> To: pgsql-sql
> Subject: which is better: using OR clauses or UNION?
> 
> 
> Hi,
> 
> I have a query hitting a table of 25 million rows. The table has a
text
> field ('identifier') which i need to query for matching rows. The
> question is if i have multiple strings to match against this field I
> can use multiple OR sub-statements or multiple statements in a UNION.
> The UNION seems to run quicker is this to be expected? or is there
> anything else I can do improve the speed of this query? Some query
> details:
> 
> 
> table "dba_data_base", index:
> "in_dba_data_base_identifier" btree (lower(identifier)
> varchar_pattern_ops)
> 
> 
> Query 1
> ---
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio,
> log_ratio, p_value
>  FROM
>  dba_data_base a
>  WHERE
>  ( lower(identifier) LIKE lower('BUGS001884677') OR
>lower(identifier) LIKE lower('BUGS001884678') OR
>lower(identifier) LIKE lower('BUGS001884679') OR
>lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)') )
>  ORDER BY a.identifier;
> 
> QUERY PLAN
> 
>
---
>
---
> --
> 
> Sort  (cost=14930.13..14939.77 rows=3857 width=62) (actual
> time=3208.466..3208.652 rows=318 loops=1)
>   Sort Key: identifier
>   ->  Bitmap Heap Scan on dba_data_base a  (cost=134.43..14700.38
> rows=3857 width=62) (actual time=81.106..3207.721 rows=318 loops=1)
> Recheck Cond: ((lower(identifier) ~~
'bugs001884677'::text)
> OR (lower(identifier) ~~ 'bugs001884678'::text) OR
> (lower(identifier) ~~ 'bugs001884679'::text) OR (lower(identifier)
> ~
> ~ 'sptigr4-2210 (6f24)'::text))
> Filter: ((lower(identifier) ~~ 'bugs001884677'::text) OR
> (lower(identifier) ~~ 'bugs001884678'::text) OR (lower(identifier)
> ~~ 'bugs001884679'::text) OR (lower(identifier) ~~ 'spt
> igr4-2210 (6f24)'::text))
> ->  BitmapOr  (cost=134.43..134.43 rows=3857 width=0) (actual
> time=71.397..71.397 rows=0 loops=1)
>   ->  Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.029..0.029 rows=0
> loops=1)
> Index Cond: (lower(identifier) ~=~
> 'bugs001884677'::character varying)
>   ->  Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0
> loops=1)
> Index Cond: (lower(identifier) ~=~
> 'bugs001884678'::character varying)
>   ->  Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=0.008..0.008 rows=0
> loops=1)
> Index Cond: (lower(identifier) ~=~
> 'bugs001884679'::character varying)
>   ->  Bitmap Index Scan on in_dba_data_base_identifier
> (cost=0.00..32.64 rows=964 width=0) (actual time=71.347..71.347
> rows=318 loops=1)
> Index Cond: (lower(identifier) ~=~ 'sptigr4-2210
> (6f24)'::character varying)
> Total runtime: 3208.904 ms
> 
> 
> Query 2
> ---
> datadb=#explain analyse SELECT a.bioassay_id, a.identifier, ratio,
> log_ratio, p_value
>  FROM
>  dba_data_base a
>  WHERE
>  lower(identifier) LIKE lower('BUGS001884677')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
>  dba_data_base a
>  WHERE
>lower(identifier) LIKE lower('BUGS001884678')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
>  dba_data_base a
>  WHERE
>lower(identifier) LIKE lower('BUGS001884679')
> UNION
> SELECT a.bioassay_id, a.identifier, ratio, log_ratio, p_value
>  FROM
>  dba_data_base a
>  WHERE
>lower(identifier) LIKE lower('SpTIGR4-2210 (6F24)')
>  ORDER BY identifier;
> 
> Sort  (cost=15702.26..15711.90 rows=3856 width=62) (actual
> time=3.688..3.886 rows=317 loops=1)
>   Sort Key: identifier
>   ->  Unique  (cost=15414.74..15472.58 rows=3856 width=62) (actual
> time=2.663..3.387 rows=317 loops=1)
> ->  Sort  (cost=15414.74..15424.38 rows=3856 width=62) (actual
> time=2.660..2.834 rows=318 loops=1)
>   Sort Key: bioassay_id, identifier, ratio, log_ratio,
> p_value
>   ->  Append  (cost=32.88..15185.06 rows=3856 width=62)
> (actual time=0.320..2.131 rows=318 loops=1)
> ->  Bitmap Heap Scan on dba_data_base a
> (cost=32.88..3786.62 rows=964 width=62) (actual time=0.041..0.041
> rows=0 loops=1)
>   Filter: (lower(identifier) ~~
> 'bugs001884677'::text)
>   ->  Bitmap Index Scan on
> in_dba_data_base_identifier  (cost=0.00..32.64 rows=964 width=0)
> (a

[SQL] parsing audit table

2011-08-16 Thread M. D.


Hi everyone,

I'm a bit lazy, or actually in a bit of a crunch.  I added an audit 
recording a few months ago, but never really used it much, but today I'm 
seeing a bunch of suspicious activity by one user.  Does someone have 
any function to quickly parse this data?


I followed this: http://wiki.postgresql.org/wiki/Audit_trigger - so I'm 
hoping someone else has something to parse that.


original data: 
"(B04TaEsAAIG5bEEX5xBVPQ,2,jakew,mag,"",2011-07-05,2011-07-05,,,7,528284,"",2011-07-05,13:20:59,2011-07-05,"",A04CQUUABSxYfxftPQqJlg,,JUfhSzwADKqAFSN1Cbv+mg,BUiZqlIABburW7jqdY9JJQ,HUZPx0gACfCxy1Y34QSTQw,,,0,N,Y,Y,919.9100,0.,0.,2011-07-05,2011-07-05)"

new data:
"(B04TaEsAAIG5bEEX5xBVPQ,3,jakew,mag,"",2011-07-05,2011-07-05,,,7,528284,"",2011-07-05,13:20:59,2011-07-05,"",A04CQUUABSxYfxftPQqJlg,,JUfhSzwADKqAFSN1Cbv+mg,BUiZqlIABburW7jqdY9JJQ,HUZPx0gACfCxy1Y34QSTQw,,,0,N,N,Y,919.9100,919.9100,0.,,2011-08-04)"

those 22 alphanumeric columns are IDs.

Thanks for any help,
Mark

--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] parsing audit table

2011-08-16 Thread Jaime Casanova

On Tue, Aug 16, 2011 at 4:02 PM, M. D.  wrote:
> Hi everyone,
>
> I'm a bit lazy, or actually in a bit of a crunch.  I added an audit
> recording a few months ago, but never really used it much, but today I'm
> seeing a bunch of suspicious activity by one user.  Does someone have any
> function to quickly parse this data?
>

that's not exactly auditable information :D

i made this one for that: https://github.com/jcasanov/pg_audit

it has two versions one using hstore in which you will see in old
"column=>old_data" and in new "column=>new_data" seems better and i
guess you can use hstore functions on it
(http://www.postgresql.org/docs/9.0/static/hstore.html)

The other one uses arrays to store column names, old values, new
values, still more parseable

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

[SQL] which is better: using OR clauses or UNION?

Re: [SQL] which is better: using OR clauses or UNION?

Re: [SQL] bigint and unix time

Re: [SQL] which is better: using OR clauses or UNION?

Re: [SQL] which is better: using OR clauses or UNION?

Re: [SQL] which is better: using OR clauses or UNION?

Re: [SQL] which is better: using OR clauses or UNION?

Re: [SQL] bigint and unix time

Re: [SQL] bigint and unix time

Re: [SQL] bigint and unix time

[SQL] which is better: using OR clauses or UNION?

Re: [SQL] which is better: using OR clauses or UNION?

Re: [SQL] which is better: using OR clauses or UNION?

[SQL] parsing audit table

Re: [SQL] parsing audit table

15 matches

Site Navigation

Mail list logo

Footer information