[PERFORM] Data caching

2009-07-09 Thread Martin Chlupac
Hello everybody,
I have a simple query which selects data from not very large table (
434161 rows) and takes far more time than I'd expect. I believe it's
due to a poor disk performance because when I execute the very same
query for a second time I get much better results (caching kicks in?).
Can you please confirm my theory or do you see any other possible
explanation?

Thank you in advance

Martin


# explain analyze select * from
records_f4f23ca0-9c35-43ac-bb0d-1ef3784399ac where variable_id=7553
and ts  '2009-07-01 17:00:00' and ts  now() order by ts limit 2;
-
 Limit  (cost=3924.13..3928.91 rows=1912 width=206) (actual
time=3687.661..3705.546 rows=2161 loops=1)
   -  Sort  (cost=3924.13..3928.91 rows=1912 width=206) (actual
time=3687.654..3693.864 rows=2161 loops=1)
 Sort Key: ts
 Sort Method:  quicksort  Memory: 400kB
 -  Bitmap Heap Scan on
records_f4f23ca0-9c35-43ac-bb0d-1ef3784399ac  (cost=76.75..3819.91
rows=1912 width=206) (actual time=329.416..3677.521 rows=2161 loops=1)
   Recheck Cond: ((variable_id = 7553) AND (ts 
'2009-07-01 17:00:00'::timestamp without time zone) AND (ts  now()))
   -  Bitmap Index Scan on pokusny_index
(cost=0.00..76.27 rows=1912 width=0) (actual time=304.160..304.160
rows=2687 loops=1)
 Index Cond: ((variable_id = 7553) AND (ts 
'2009-07-01 17:00:00'::timestamp without time zone) AND (ts  now()))
 Total runtime: 3711.488 ms
(9 rows)

# explain analyze select * from
records_f4f23ca0-9c35-43ac-bb0d-1ef3784399ac where variable_id=7553
and ts  '2009-07-01 17:00:00' and ts  now() order by ts limit 2;

QUERY PLAN
-
 Limit  (cost=3924.13..3928.91 rows=1912 width=206) (actual
time=18.135..35.140 rows=2161 loops=1)
   -  Sort  (cost=3924.13..3928.91 rows=1912 width=206) (actual
time=18.127..24.064 rows=2161 loops=1)
 Sort Key: ts
 Sort Method:  quicksort  Memory: 400kB
 -  Bitmap Heap Scan on
records_f4f23ca0-9c35-43ac-bb0d-1ef3784399ac  (cost=76.75..3819.91
rows=1912 width=206) (actual time=1.616..10.369 rows=2161 loops=1)
   Recheck Cond: ((variable_id = 7553) AND (ts 
'2009-07-01 17:00:00'::timestamp without time zone) AND (ts  now()))
   -  Bitmap Index Scan on pokusny_index
(cost=0.00..76.27 rows=1912 width=0) (actual time=1.352..1.352
rows=2687 loops=1)
 Index Cond: ((variable_id = 7553) AND (ts 
'2009-07-01 17:00:00'::timestamp without time zone) AND (ts  now()))
 Total runtime: 40.971 ms
(9 rows)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data caching

2009-07-09 Thread Richard Huxton

Martin Chlupac wrote:

Hello everybody,
I have a simple query which selects data from not very large table (
434161 rows) and takes far more time than I'd expect. I believe it's
due to a poor disk performance because when I execute the very same
query for a second time I get much better results (caching kicks in?).
Can you please confirm my theory or do you see any other possible
explanation?


Yep - it's the difference between fetching from memory and from disk.


 -  Bitmap Heap Scan on
records_f4f23ca0-9c35-43ac-bb0d-1ef3784399ac  (cost=76.75..3819.91
rows=1912 width=206) (actual time=329.416..3677.521 rows=2161 loops=1)



 -  Bitmap Heap Scan on
records_f4f23ca0-9c35-43ac-bb0d-1ef3784399ac  (cost=76.75..3819.91
rows=1912 width=206) (actual time=1.616..10.369 rows=2161 loops=1)


The plan scans the index, and builds up a bitmap of which disk-blocks 
contain (potential) matches. It then has to read the blocks (the heap 
scan above), confirm they match and then return the rows. If you look at 
the actual time above you can see about 90% of the slow query is spent 
doing this.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Sorting by an arbitrary criterion

2009-07-09 Thread Craig James

Suppose I have a large table with a small-cardinality CATEGORY column (say, 
categories 1..5).  I need to sort by an arbitrary (i.e. user-specified) mapping 
of CATEGORY, something like this:

 1 = 'z'
 2 = 'a'
 3 = 'b'
 4 = 'w'
 5 = 'h'

So when I get done, the sort order should be 2,3,5,4,1.

I could create a temporary table with the category-to-key mapping, but is there 
any way to do this in a single SQL statement?

Thanks,
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorting by an arbitrary criterion

2009-07-09 Thread Grzegorz Jaśkiewicz
On Thu, Jul 9, 2009 at 5:26 PM, Craig Jamescraig_ja...@emolecules.com wrote:
 Suppose I have a large table with a small-cardinality CATEGORY column (say,
 categories 1..5).  I need to sort by an arbitrary (i.e. user-specified)
 mapping of CATEGORY, something like this:

  1 = 'z'
  2 = 'a'
  3 = 'b'
  4 = 'w'
  5 = 'h'

 So when I get done, the sort order should be 2,3,5,4,1.

 I could create a temporary table with the category-to-key mapping, but is
 there any way to do this in a single SQL statement?


you can create translation table, join it, and sort by its key.


-- 
GJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Huge difference in query performance between 8.3 and 8.4 (possibly)

2009-07-09 Thread Robin Houston
[ Attempting to resend, because it didn't seem to get through last time. ]

We have a query that runs very slowly on our 8.3 database. (I can't
tell you exactly how slowly, because it has never successfully run to
completion even when we left it running overnight.) On the 8.4
database on my laptop, it runs in about 90 seconds. Of course there
are several differences between the two instances, but I wonder
whether query planning improvements in 8.4 could essentially account
for it. (In case you're wondering, I ran 'vacuum full analyze' on the
slow one, and it made no discernible difference.) The two instances
have the same schema and data.
The query looks like this:
select first.feature_cvterm_id from feature_cvterm first
join feature_cvterm_dbxref first_fvd
on first.feature_cvterm_id = first_fvd.feature_cvterm_id
join dbxref first_withfrom_dbxref
on first_fvd.dbxref_id = first_withfrom_dbxref.dbxref_id
   , cv
   , cvterm evidence_type join cv evidence_type_cv
        on evidence_type.cv_id = evidence_type_cv.cv_id
   , feature_cvtermprop first_evidence
   , feature_cvterm second
left join feature_cvtermprop second_evidence
    on second_evidence.feature_cvterm_id = second.feature_cvterm_id
join feature_cvterm_dbxref second_fvd
on second.feature_cvterm_id = second_fvd.feature_cvterm_id
join dbxref second_withfrom_dbxref
on second_fvd.dbxref_id = second_withfrom_dbxref.dbxref_id
   , cvterm second_term
   , cvterm first_term
   , feature
where first.cvterm_id = first_term.cvterm_id
and first_evidence.feature_cvterm_id = first.feature_cvterm_id
and second_term.cv_id = cv.cv_id
and first_term.cv_id = cv.cv_id
and cv.name in (
      'biological_process'
    , 'molecular_function'
    , 'cellular_component'
)
and second.feature_id = feature.feature_id
and second.feature_id = first.feature_id
and first.cvterm_id = first_term.cvterm_id
and second.cvterm_id = second_term.cvterm_id
and second.pub_id = first.pub_id
and evidence_type.name = 'evidence'
and evidence_type_cv.name = 'genedb_misc'
and second_evidence.type_id = evidence_type.cvterm_id
and first_evidence.type_id = evidence_type.cvterm_id
and second.feature_cvterm_id  first.feature_cvterm_id
and first_withfrom_dbxref.accession = second_withfrom_dbxref.accession
and upper(first_evidence.value) = upper(second_evidence.value)
and first_term.name = second_term.name
;
(There's some fairly obvious room for improvement in this query as
written, but none of the changes I've tried have changed the overall
performance picture.)
The execution plan on the (slow) 8.3 server is:

 Nested Loop  (cost=44050.86..77140.03 rows=1 width=4)
   Join Filter: (second_term.cv_id = cv.cv_id)
   -  Nested Loop  (cost=44050.86..77138.61 rows=1 width=12)
         Join Filter: ((first_term.cv_id = second_term.cv_id) AND
((first_term.name)::text = (second_term.name)::text))
         -  Nested Loop  (cost=44050.86..77130.32 rows=1 width=56)
               -  Nested Loop  (cost=44050.86..77122.65 rows=1 width=12)
                     Join Filter: (upper(second_evidence.value) =
upper(first_evidence.value))
                     -  Nested Loop  (cost=44050.86..77114.32 rows=1 width=50)
                           Join Filter: ((second.feature_cvterm_id 
first.feature_cvterm_id) AND (second.feature_id = first.feature_id)
AND (second.pub_id = first.pub_id) AND
((second_withfrom_dbxref.accession)::text =
(first_withfrom_dbxref.accession)::text))
                           -  Nested Loop  (cost=30794.26..42915.70
rows=1 width=69)
                                 -  Hash Join
(cost=30794.26..42906.88 rows=1 width=65)
                                       Hash Cond:
(second_evidence.type_id = evidence_type.cvterm_id)
                                       -  Hash Join
(cost=30784.59..42807.07 rows=24035 width=61)
                                             Hash Cond:
(second_fvd.dbxref_id = second_withfrom_dbxref.dbxref_id)
                                             -  Hash Join
(cost=19044.44..28262.26 rows=24035 width=50)
                                                   Hash Cond:
(second_evidence.feature_cvterm_id = second.feature_cvterm_id)
                                                   -  Seq Scan on
feature_cvtermprop second_evidence  (cost=0.00..4370.07 rows=223307
width=34)
                                                   -  Hash
(cost=18169.19..18169.19 rows=47620 width=24)
                                                         -  Hash Join
 (cost=1516.45..18169.19 rows=47620 width=24)
                                                               Hash
Cond: (second.feature_cvterm_id = second_fvd.feature_cvterm_id)
                                                               -  Seq
Scan on feature_cvterm second  (cost=0.00..7243.27 rows=442427
width=16)
                                                               -
Hash  (cost=734.20..734.20 rows=47620 width=8)

-  Seq Scan on feature_cvterm_dbxref second_fvd  (cost=0.00..734.20
rows=47620 width=8)
                        

Re: [PERFORM] Sorting by an arbitrary criterion

2009-07-09 Thread Tom Lane
=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= gryz...@gmail.com writes:
 On Thu, Jul 9, 2009 at 5:26 PM, Craig Jamescraig_ja...@emolecules.com wrote:
 Suppose I have a large table with a small-cardinality CATEGORY column (say,
 categories 1..5).  I need to sort by an arbitrary (i.e. user-specified)
 mapping of CATEGORY, something like this:

 you can create translation table, join it, and sort by its key.

Much easier to
ORDER BY CASE category WHEN 'z' THEN 1 WHEN 'a' THEN 2 ... END

Actually, consider putting the CASE into a function and doing
ORDER BY sort_order(category)

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorting by an arbitrary criterion

2009-07-09 Thread Kevin Grittner
Craig James craig_ja...@emolecules.com wrote: 
 Suppose I have a large table with a small-cardinality CATEGORY
 column (say, categories 1..5).  I need to sort by an arbitrary
 (i.e. user-specified) mapping of CATEGORY
 
There was a recent thread discussing ways to do that:
 
http://archives.postgresql.org/pgsql-admin/2009-07/msg00016.php
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorting by an arbitrary criterion

2009-07-09 Thread Alexander Staubo
On Thu, Jul 9, 2009 at 6:26 PM, Craig Jamescraig_ja...@emolecules.com wrote:
 Suppose I have a large table with a small-cardinality CATEGORY column (say,
 categories 1..5).  I need to sort by an arbitrary (i.e. user-specified)
 mapping of CATEGORY, something like this:

  1 = 'z'
  2 = 'a'
  3 = 'b'
  4 = 'w'
  5 = 'h'

 So when I get done, the sort order should be 2,3,5,4,1.

If the object is to avoid a separate table, you can do it with a
case statement:

  select ... from ...
  order by case category
when 1 then 'z'
when 2 then 'a'
when 3 then 'b'
when 4 then 'w'
when 5 then 'h'
  end

If you this sounds slow, you're right. But it might perform well
enough for your use case.

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Huge difference in query performance between 8.3 and 8.4 (possibly)

2009-07-09 Thread Alvaro Herrera
Robin Houston escribió:

 We have a query that runs very slowly on our 8.3 database. (I can't
 tell you exactly how slowly, because it has never successfully run to
 completion even when we left it running overnight.) On the 8.4
 database on my laptop, it runs in about 90 seconds. Of course there
 are several differences between the two instances, but I wonder
 whether query planning improvements in 8.4 could essentially account
 for it.

Of course.  Great news.  Congratulations.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Huge difference in query performance between 8.3 and 8.4 (possibly)

2009-07-09 Thread Tom Lane
Robin Houston robin.hous...@gmail.com writes:
 We have a query that runs very slowly on our 8.3 database. (I can't
 tell you exactly how slowly, because it has never successfully run to
 completion even when we left it running overnight.) On the 8.4
 database on my laptop, it runs in about 90 seconds. Of course there
 are several differences between the two instances, but I wonder
 whether query planning improvements in 8.4 could essentially account
 for it.

Well, it's hard to be sure with only EXPLAIN and not EXPLAIN ANALYZE
output to look at; but I think the significant difference in these plans
is that 8.4 has chosen a hash instead of nestloop join for a couple of
the intermediate join levels.  Which is evidently because of a change
in the estimated size of the next join down:

 -  Nested Loop  (cost=44050.86..77114.32 rows=1 width=50)
   Join Filter: ((second.feature_cvterm_id  
first.feature_cvterm_id) AND (second.feature_id = first.feature_id) AND 
(second.pub_id = first.pub_id) AND ((second_withfrom_dbxref.accession)::text = 
(first_withfrom_dbxref.accession)::text))
   -  Nested Loop  (cost=30794.26..42915.70 rows=1 
width=69)
 -  Hash Join (cost=30794.26..42906.88 rows=1 
width=65)
   Hash Cond: (second_evidence.type_id = 
evidence_type.cvterm_id)

versus

   -  Hash Join  (cost=63949.73..77732.49 rows=1 
width=59)
 Hash Cond: ((second.feature_id = 
first.feature_id) AND (second.pub_id = first.pub_id) AND 
((second_withfrom_dbxref.accession)::text = 
(first_withfrom_dbxref.accession)::text))
 Join Filter: (second.feature_cvterm_id  
first.feature_cvterm_id)
 -  Hash Join (cost=30236.57..41303.13 
rows=4607 width=66)
   Hash Cond: (second_evidence.type_id = 
evidence_type.cvterm_id)

If the 8.4 rowcount estimate is accurate then it's not surprising that
the nestloop plan sucks --- it'd be re-executing the other arm of the
join 4600 or so times.

This could reflect improvements in the join size estimation code, or
maybe it's just a consequence of 8.4 using larger statistics targets
by default.  It's hard to be sure with so little information to go on.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] embedded sql regression from 8.2.4 to 8.3.7

2009-07-09 Thread Haszlakiewicz, Eric

I noticed a bit of a performance regression in embedded sql queries when
moving from the client libraries in verison 8.2.4 to 8.3.7.  My
application does a whole lot of queries, many of which don't return any
data.  When we moved to the new libraries the time of running a query
(from the application point of view) went from about 550 usec to 800
usec.  In both cases this was against a server running 8.3.7.
I turned on log_statement_stats and noticed that the behaviour is
slightly different, and the 8.3.7 version sends the statement to the
server twice, while 8.2.4 only sends it once.

const char *SQL_text = select * from foo;  (not always the same
query)
exec sql prepare s_1ab from :SQL_text;    [*1]
exec sql declare c_1ab cursor for s_1ab;
exec sql open c_1ab;  [*2]

At [*1], with the 8.3.7 libraries, I see in the server log:
STATEMENT:  select * from foo

With 8.2.4, nothing is logged.  Both versions send the statement to
declare the cursor:
STATEMENT:  declare c_1ab  cursor  for select * from foo

Suggestions?

eric

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Huge difference in query performance between 8.3 and 8.4 (possibly)

2009-07-09 Thread Hartman, Matthew
 From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-
 ow...@postgresql.org] On Behalf Of Robin Houston
 Sent: Thursday, July 09, 2009 12:35 PM

 We have a query that runs very slowly on our 8.3 database. (I can't
 tell you exactly how slowly, because it has never successfully run to
 completion even when we left it running overnight.) On the 8.4
 database on my laptop, it runs in about 90 seconds. 
 Any insights would be much appreciated.

I doubt this is the insight you're looking for, but that is the worst
query I have ever seen. It is difficult to understand exactly what it
returns. There are so many cross joins, outer joins, and inner joins
mixed up together, ugh.

Rather than trying to puzzle out why it is slow, rewrite it. It will be
faster than before on any version.

Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorting by an arbitrary criterion

2009-07-09 Thread hubert depesz lubaczewski
On Thu, Jul 09, 2009 at 09:26:42AM -0700, Craig James wrote:
 Suppose I have a large table with a small-cardinality CATEGORY column (say, 
 categories 1..5).  I need to sort by an arbitrary (i.e. user-specified) 
 mapping of CATEGORY, something like this:

  1 = 'z'
  2 = 'a'
  3 = 'b'
  4 = 'w'
  5 = 'h'
 So when I get done, the sort order should be 2,3,5,4,1.
 I could create a temporary table with the category-to-key mapping, but is 
 there any way to do this in a single SQL statement?

You can do it like this:

select c.*
from categories c, ( values (1, 'z'), (2, 'a'), (3, 'b'), (4, 'w'), (5, 'h') ) 
as o (id, ordering) on c.id = o.id
order by o.ordering

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: dep...@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorting by an arbitrary criterion

2009-07-09 Thread Hartman, Matthew
 On Thu, Jul 09, 2009 at 09:26:42AM -0700, Craig James wrote:
 You can do it like this:
 select c.*
 from categories c, ( values (1, 'z'), (2, 'a'), (3, 'b'), (4, 'w'),
(5,
 'h') ) as o (id, ordering) on c.id = o.id
 order by o.ordering

Another option would be:

select c.*
from categories c
order by case(c.category) when 1 then 'z' when 2 then 'a' then 3 then
'b' when 4 then 'w' when 5 then 'h' end;

Matthew Hartman
Programmer/Analyst
Information Management, ICP
Kingston General Hospital


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sorting by an arbitrary criterion

2009-07-09 Thread Grzegorz Jaśkiewicz
2009/7/9 Tom Lane t...@sss.pgh.pa.us:
 =?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= gryz...@gmail.com writes:
 On Thu, Jul 9, 2009 at 5:26 PM, Craig Jamescraig_ja...@emolecules.com 
 wrote:
 Suppose I have a large table with a small-cardinality CATEGORY column (say,
 categories 1..5).  I need to sort by an arbitrary (i.e. user-specified)
 mapping of CATEGORY, something like this:

 you can create translation table, join it, and sort by its key.

 Much easier to
        ORDER BY CASE category WHEN 'z' THEN 1 WHEN 'a' THEN 2 ... END

 Actually, consider putting the CASE into a function and doing
        ORDER BY sort_order(category)

I suppose table is handy, when you have a lot of items as keys...



-- 
GJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance