Re: [PERFORM] performance with query

2009-06-18 Thread Alberto Dalmaso
P.S.: to understand what the query has to make (and 80% of the view hve
these to make): a lot of time is spend to pivoting a table with a
structure like
identifier, description_of_value, numeric value
that has to be transformed in
identifier, description_1, description_2, ..., description_n
where n is not a fixed number (it changes in function of the type of
calculation that was used to generate the rows in the table).

perhaps this information could help.

thanks everybady


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] enum for performance?

2009-06-18 Thread Merlin Moncure
On Wed, Jun 17, 2009 at 6:06 PM, Whit Armstrongarmstrong.w...@gmail.com wrote:
 I have a column which only has six states or values.

 Is there a size advantage to using an enum for this data type?
 Currently I have it defined as a character(1).

 This table has about 600 million rows, so it could wind up making a
 difference in total size.

Here is what enums get you:
*) You can skip a join to a detail table if one char is not enough to
sufficiently describe the value to the user.
*) If you need to order by the whats contained in the enum, the gains
can be tremendous because it can be inlined in the index:

create table bigtable
(
  company_id bigint,
  someval some_enum_t,
  sometime timestamptz,
);

create index bigindex on bigtable(company_id, someval, sometime);

select * from bigtable order by 1,2,3 limit 50;
-- or
select * from bigtable where company_id = 12345 order by 2,3;

The disadvantage with enums is flexibility.  Sometimes the performance
doesn't matter or you need that detail table anyways for other
reasons.

Also, if you use char vs char(1), you shave a byte and a tiny bit of speed.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very slow selects on a small table

2009-06-18 Thread Brian Cox

Tom Lane [...@sss.pgh.pa.us] wrote:

Yup.  So according to those stats, all ts_id values fall in the range
61 .. 6000250068.  It's no wonder it's not
expecting to find anything between 0 and 10.  I think maybe you
forgot to re-analyze after loading data ... although this being 8.3,
I'd have expected autovacuum to update the stats at some point ...
yes, this is a concern. I may have to do the vacuum analyze myself or 
learn how to make autovacuum run more frequently.


Recommendation: re-ANALYZE, check that the plan changes to something
with a higher estimate for the number of rows for this table, and then
abort and restart those processes.  Lord knows how long you'll be
waiting for them to finish with their current plans :-(
these queries are still running now 27.5 hours later... These queries 
are generated by some java code and in putting it into a test program so 
I could capture the queries, I failed to get the id range correct -- 
sorry for wasting your time with bogus data. Below is the EXPLAIN output 
from the 4 correct queries. I can't tell which one is being executed by 
PID 7397, but the query plans, except the last, do look very similar. In 
any event, as I mentioned, all 4 are still running.


Thanks,
Brian

cemdb=# explain select * from ts_stats_transetgroup_user_daily a where 
a.ts_id in (select b.ts_id from ts_stats_transetgroup_user_daily 
b,ts_stats_transet_user_interval c, ts_transetgroup_transets_map m where 
b.ts_transet_group_id = m.ts_transet_group_id and 
m.ts_transet_incarnation_id = c.ts_transet_incarnation_id and 
c.ts_user_incarnation_id = b.ts_user_incarnation_id and 
c.ts_interval_start_time = '2009-6-16 01:00' and 
c.ts_interval_start_time  '2009-6-16 02:00') and a.ts_id  
61 and a.ts_id  62 order by a.ts_id;


QUERY PLAN

 Sort  (cost=138722.75..138734.37 rows=9299 width=779)
   Sort Key: a.ts_id
   -  Hash IN Join  (cost=131710.94..138416.28 rows=9299 width=779)
 Hash Cond: (a.ts_id = b.ts_id)
 -  Index Scan using ts_stats_transetgroup_user_daily_pkey on 
ts_stats_transetgroup_user_daily a  (cost=0.00..6602.21 rows=9299 width=779)
   Index Cond: ((ts_id  61::bigint) AND 
(ts_id  62::bigint))

 -  Hash  (cost=130113.34..130113.34 rows=255616 width=8)
   -  Hash Join  (cost=82370.45..130113.34 rows=255616 
width=8)
 Hash Cond: ((m.ts_transet_group_id = 
b.ts_transet_group_id) AND (c.ts_user_incarnation_id = 
b.ts_user_incarnation_id))
 -  Hash Join  (cost=3.32..29255.47 rows=229502 
width=16)
   Hash Cond: (c.ts_transet_incarnation_id = 
m.ts_transet_incarnation_id)
   -  Index Scan using 
ts_stats_transet_user_interval_starttime on 
ts_stats_transet_user_interval c  (cost=0.00..27674.33 rows=229502 width=16)
 Index Cond: ((ts_interval_start_time 
= '2009-06-16 01:00:00-07'::timestamp with time zone) AND 
(ts_interval_start_time  '2009-06-16 02:00:00-07'::timestamp with time 
zone))

   -  Hash  (cost=2.58..2.58 rows=117 width=16)
 -  Seq Scan on 
ts_transetgroup_transets_map m  (cost=0.00..2.58 rows=117 width=16)
 -  Hash  (cost=80511.26..80511.26 rows=247451 
width=24)
   -  Seq Scan on 
ts_stats_transetgroup_user_daily b  (cost=0.00..80511.26 rows=247451 
width=24)

(17 rows)

cemdb=# explain select * from ts_stats_transetgroup_user_daily a where 
a.ts_id in (select b.ts_id from ts_stats_transetgroup_user_daily 
b,ts_stats_transet_user_interval c, ts_transetgroup_transets_map m where 
b.ts_transet_group_id = m.ts_transet_group_id and 
m.ts_transet_incarnation_id = c.ts_transet_incarnation_id and 
c.ts_user_incarnation_id = b.ts_user_incarnation_id and 
c.ts_interval_start_time = '2009-6-16 01:00' and 
c.ts_interval_start_time  '2009-6-16 02:00') and a.ts_id  
62 and a.ts_id  63 order by a.ts_id;


QUERY PLAN

 Sort  (cost=139430.64..139443.43 rows=10237 width=779)
   Sort Key: a.ts_id
   -  Hash IN Join  (cost=131710.94..139089.71 rows=10237 width=779)
 Hash Cond: (a.ts_id = b.ts_id)
 -  Index Scan using ts_stats_transetgroup_user_daily_pkey on 
ts_stats_transetgroup_user_daily a  (cost=0.00..7265.23 rows=10237 
width=779)
   Index Cond: ((ts_id  62::bigint) AND 
(ts_id  

Re: [PERFORM] very slow selects on a small table

2009-06-18 Thread Grzegorz Jaśkiewicz
On Thu, Jun 18, 2009 at 6:06 PM, Brian Coxbrian@ca.com wrote:

 these queries are still running now 27.5 hours later... These queries are
 generated by some java code and in putting it into a test program so I could
 capture the queries, I failed to get the id range correct -- sorry for
 wasting your time with bogus data. Below is the EXPLAIN output from the 4
 correct queries. I can't tell which one is being executed by PID 7397, but
 the query plans, except the last, do look very similar. In any event, as I
 mentioned, all 4 are still running.
this might be quite bogus question, just a hit - but what is your
work_mem set to ?
Guys, isn't postgresql giving hudge cost, when it can't sort in memory ?

-- 
GJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very slow selects on a small table

2009-06-18 Thread Brian Cox

Grzegorz Jakiewicz [gryz...@gmail.com] wrote:

this might be quite bogus question, just a hit - but what is your
work_mem set to ?
Guys, isn't postgresql giving hudge cost, when it can't sort in memory ?

work_mem = 64MB

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very slow selects on a small table

2009-06-18 Thread Grzegorz Jaśkiewicz
On Thu, Jun 18, 2009 at 6:16 PM, Brian Cox brian@ca.com wrote:

 Grzegorz Jakiewicz [gryz...@gmail.com] wrote:

 this might be quite bogus question, just a hit - but what is your
 work_mem set to ?
 Guys, isn't postgresql giving hudge cost, when it can't sort in memory ?

 work_mem = 64MB

try increasing it please, to say 256MB

8.4 in explain analyze actually informs you whether sorting was done on disc
or in memory, but you probably don't run stuff on cutting edge ;)



-- 
GJ


Re: [PERFORM] very slow selects on a small table

2009-06-18 Thread Tom Lane
Brian Cox brian@ca.com writes:
 these queries are still running now 27.5 hours later... These queries 
 are generated by some java code and in putting it into a test program so 
 I could capture the queries, I failed to get the id range correct -- 
 sorry for wasting your time with bogus data. Below is the EXPLAIN output 
 from the 4 correct queries. I can't tell which one is being executed by 
 PID 7397, but the query plans, except the last, do look very similar. In 
 any event, as I mentioned, all 4 are still running.

Strange as can be.  Can you do an EXPLAIN ANALYZE on just the IN's
sub-select and confirm that the runtime of that is reasonable?  I'd
be interested to know how many rows it really returns, too.

One thing that strikes me is that the cost estimates seem a bit on the
low side for the rowcounts.  Are you using nondefault planner cost
parameters, and if so what?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Strange performance response for high load times

2009-06-18 Thread Peter Alban
Hi All,

We are having a reasonably powerful machine for supporting about 20
databases but in total they're not more then 4GB in size.

The machine is 2 processor 8 core and 8 Gig or ram so I would expect that PG
should cache the whole db into memory. Well actually it doesn't.

What is more strange that a query that under zero load is running under
100ms during high load times it can take up to 15 seconds !!
What on earth can make such difference ?

here are the key config options that I set up :
# - Memory -

shared_buffers = 17 # min 16 or
max_connections*2, 8KB each
temp_buffers = 21000# min 100, 8KB each
#max_prepared_transactions = 5  # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of shared
memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 1048576  # min 64, size in KB
maintenance_work_mem = 1048576  # min 1024, size in KB
#max_stack_depth = 2048 # min 100, size in KB

# - Free Space Map -

max_fsm_pages = 524298  # min max_fsm_relations*16, 6 bytes
each
max_fsm_relations = 32768   # min 100, ~70 bytes each

# - Kernel Resource Usage -

max_files_per_process = 4000# min 25
#preload_libraries = ''

any ideas ?

cheers,
Peter


Re: [PERFORM] Strange performance response for high load times

2009-06-18 Thread Kenneth Marshall
On Thu, Jun 18, 2009 at 08:27:02PM +0200, Peter Alban wrote:
 Hi All,
 
 We are having a reasonably powerful machine for supporting about 20
 databases but in total they're not more then 4GB in size.
 
 The machine is 2 processor 8 core and 8 Gig or ram so I would expect that PG
 should cache the whole db into memory. Well actually it doesn't.
 
 What is more strange that a query that under zero load is running under
 100ms during high load times it can take up to 15 seconds !!
 What on earth can make such difference ?
 
 here are the key config options that I set up :
 # - Memory -
 
 shared_buffers = 17 # min 16 or
 max_connections*2, 8KB each
 temp_buffers = 21000# min 100, 8KB each
 #max_prepared_transactions = 5  # can be 0 or more
 # note: increasing max_prepared_transactions costs ~600 bytes of shared
 memory
 # per transaction slot, plus lock space (see max_locks_per_transaction).
 work_mem = 1048576  # min 64, size in KB
 maintenance_work_mem = 1048576  # min 1024, size in KB

1GB of work_mem is very high if you have more than a couple of
queries that use it.

Ken

 #max_stack_depth = 2048 # min 100, size in KB
 
 # - Free Space Map -
 
 max_fsm_pages = 524298  # min max_fsm_relations*16, 6 bytes
 each
 max_fsm_relations = 32768   # min 100, ~70 bytes each
 
 # - Kernel Resource Usage -
 
 max_files_per_process = 4000# min 25
 #preload_libraries = ''
 
 any ideas ?
 
 cheers,
 Peter

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-18 Thread Kevin Grittner
Alberto Dalmaso dalm...@clesius.it wrote: 
 P.S.: to understand what the query has to make (and 80% of the view
 hve these to make): a lot of time is spend to pivoting a table with
 a structure like
 identifier, description_of_value, numeric value
 that has to be transformed in
 identifier, description_1, description_2, ..., description_n
 where n is not a fixed number (it changes in function of the type of
 calculation that was used to generate the rows in the table).
 
 perhaps this information could help.
 
What would help more is the actual query, if that can be shared.  It
leaves a lot less to the imagination than descriptions of it.
 
There are a couple things which have been requested which would help
pin down the reason the optimizer is not getting to a good plan, so
that it can be allowed to do a good job.  As Tom said, this would be a
much more productive focus than casting about for ways to force it to
do what you think is the best thing.  (Maybe, given the chance, it can
come up with a plan which runs in seconds, rather than over the 24
minutes you've gotten.)
 
With all the optimizer options on, and the from_collapse_limit and
join_collapse_limit values both set to 100, run an EXPLAIN (no
ANALYZE) on your big problem query.  Let us know how long the EXPLAIN
runs.  If it gets any errors, copy and paste all available
information.  (General descriptions aren't likely to get us very far.)
Since EXPLAIN without ANALYZE only *plans* the query, but doesn't run
it, it should not take long to do this.
 
If there are any views or custom functions involved, showing those
along with the query source would be good.
 
If we get this information, we have a much better chance to find the
real problem and get it fixed.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance with query

2009-06-18 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 With all the optimizer options on, and the from_collapse_limit and
 join_collapse_limit values both set to 100, run an EXPLAIN (no
 ANALYZE) on your big problem query.  Let us know how long the EXPLAIN
 runs.  If it gets any errors, copy and paste all available
 information.  (General descriptions aren't likely to get us very far.)
 Since EXPLAIN without ANALYZE only *plans* the query, but doesn't run
 it, it should not take long to do this.

One issue here is that with the collapse limits cranked up to more than
geqo_threshold, he's going to be coping with GEQO's partially-random
plan selection; so whatever he reports might or might not be especially
reflective of day-to-day results.  I'm tempted to ask that he also push
up geqo_threshold.  It's possible that that *will* send the planning
time to the moon; but it would certainly be worth trying, to find out
what plan is produced.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Strange performance response for high load times

2009-06-18 Thread Peter Alban
So Ken ,

What do you reckon it should be ? What is the rule of thumb here ?

cheers,
Peter

On Thu, Jun 18, 2009 at 8:30 PM, Kenneth Marshall k...@rice.edu wrote:

 On Thu, Jun 18, 2009 at 08:27:02PM +0200, Peter Alban wrote:
  Hi All,
 
  We are having a reasonably powerful machine for supporting about 20
  databases but in total they're not more then 4GB in size.
 
  The machine is 2 processor 8 core and 8 Gig or ram so I would expect that
 PG
  should cache the whole db into memory. Well actually it doesn't.
 
  What is more strange that a query that under zero load is running under
  100ms during high load times it can take up to 15 seconds !!
  What on earth can make such difference ?
 
  here are the key config options that I set up :
  # - Memory -
 
  shared_buffers = 17 # min 16 or
  max_connections*2, 8KB each
  temp_buffers = 21000# min 100, 8KB each
  #max_prepared_transactions = 5  # can be 0 or more
  # note: increasing max_prepared_transactions costs ~600 bytes of shared
  memory
  # per transaction slot, plus lock space (see max_locks_per_transaction).
  work_mem = 1048576  # min 64, size in KB
  maintenance_work_mem = 1048576  # min 1024, size in KB

 1GB of work_mem is very high if you have more than a couple of
 queries that use it.

 Ken

  #max_stack_depth = 2048 # min 100, size in KB
 
  # - Free Space Map -
 
  max_fsm_pages = 524298  # min max_fsm_relations*16, 6
 bytes
  each
  max_fsm_relations = 32768   # min 100, ~70 bytes each
 
  # - Kernel Resource Usage -
 
  max_files_per_process = 4000# min 25
  #preload_libraries = ''
 
  any ideas ?
 
  cheers,
  Peter



Re: [PERFORM] performance with query

2009-06-18 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote: 
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 With all the optimizer options on, and the from_collapse_limit and
 join_collapse_limit values both set to 100, run an EXPLAIN (no
 ANALYZE) on your big problem query.  Let us know how long the
 EXPLAIN runs.  If it gets any errors, copy and paste all available
 information.  (General descriptions aren't likely to get us very
 far.)  Since EXPLAIN without ANALYZE only *plans* the query, but
 doesn't run it, it should not take long to do this.
 
 One issue here is that with the collapse limits cranked up to more
 than geqo_threshold, he's going to be coping with GEQO's partially-
 random plan selection; so whatever he reports might or might not be
 especially reflective of day-to-day results.  I'm tempted to ask
 that he also push up geqo_threshold.
 
In an earlier post[1] he said that he had geqo turned off.  It does
pay to be explicit, though; I'd hate to assume it's of if he's been
changing things.
 
Alberto, please ensure that you still have geqo off when you run the
test I suggested.  Also, I see that I didn't explicitly say that you
should send the ANALYZE output, but that's what would be helpful.
 
 It's possible that that *will* send the planning time to the moon;
 but it would certainly be worth trying, to find out what plan is
 produced.
 
Agreed.  What plan is produced, and how long that takes.  (And whether
he gets an out of memory error.)  I figured it was best to get a clear
answer to those before moving on
 
-Kevin
 
[1]
http://archives.postgresql.org/pgsql-performance/2009-06/msg00186.php

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Strange performance response for high load times

2009-06-18 Thread Kenneth Marshall
On Thu, Jun 18, 2009 at 09:42:47PM +0200, Peter Alban wrote:
 So Ken ,
 
 What do you reckon it should be ? What is the rule of thumb here ?
 
 cheers,
 Peter
 

It really depends on your query mix. The key to remember is that
multiples (possibly many) of the work_mem value can be allocated
in an individual query. You can set it on a per query basis to 
help manage it use, i.e. up it for only the query that needs it.
With our systems, which run smaller number of queries we do use
256MB. I hope that this helps.

Regards,
Ken
 On Thu, Jun 18, 2009 at 8:30 PM, Kenneth Marshall k...@rice.edu wrote:
 
  On Thu, Jun 18, 2009 at 08:27:02PM +0200, Peter Alban wrote:
   Hi All,
  
   We are having a reasonably powerful machine for supporting about 20
   databases but in total they're not more then 4GB in size.
  
   The machine is 2 processor 8 core and 8 Gig or ram so I would expect that
  PG
   should cache the whole db into memory. Well actually it doesn't.
  
   What is more strange that a query that under zero load is running under
   100ms during high load times it can take up to 15 seconds !!
   What on earth can make such difference ?
  
   here are the key config options that I set up :
   # - Memory -
  
   shared_buffers = 17 # min 16 or
   max_connections*2, 8KB each
   temp_buffers = 21000# min 100, 8KB each
   #max_prepared_transactions = 5  # can be 0 or more
   # note: increasing max_prepared_transactions costs ~600 bytes of shared
   memory
   # per transaction slot, plus lock space (see max_locks_per_transaction).
   work_mem = 1048576  # min 64, size in KB
   maintenance_work_mem = 1048576  # min 1024, size in KB
 
  1GB of work_mem is very high if you have more than a couple of
  queries that use it.
 
  Ken
 
   #max_stack_depth = 2048 # min 100, size in KB
  
   # - Free Space Map -
  
   max_fsm_pages = 524298  # min max_fsm_relations*16, 6
  bytes
   each
   max_fsm_relations = 32768   # min 100, ~70 bytes each
  
   # - Kernel Resource Usage -
  
   max_files_per_process = 4000# min 25
   #preload_libraries = ''
  
   any ideas ?
  
   cheers,
   Peter
 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Strange performance response for high load times

2009-06-18 Thread Peter Alban
What's  still badgering me , is the performance when, there is no load or
significantly lower than peek times ?

Why is there such a big difference ?

i.e. off peek times a simple select with where (on indexed column) and limit
taks* 40 ms* during peek times it took *2 seconds*  - 50 times slower !

cheers,
Peter


On Thu, Jun 18, 2009 at 10:01 PM, Kenneth Marshall k...@rice.edu wrote:

 On Thu, Jun 18, 2009 at 09:42:47PM +0200, Peter Alban wrote:
  So Ken ,
 
  What do you reckon it should be ? What is the rule of thumb here ?
 
  cheers,
  Peter
 

 It really depends on your query mix. The key to remember is that
 multiples (possibly many) of the work_mem value can be allocated
 in an individual query. You can set it on a per query basis to
 help manage it use, i.e. up it for only the query that needs it.
 With our systems, which run smaller number of queries we do use
 256MB. I hope that this helps.

 Regards,
 Ken
  On Thu, Jun 18, 2009 at 8:30 PM, Kenneth Marshall k...@rice.edu wrote:
 
   On Thu, Jun 18, 2009 at 08:27:02PM +0200, Peter Alban wrote:
Hi All,
   
We are having a reasonably powerful machine for supporting about 20
databases but in total they're not more then 4GB in size.
   
The machine is 2 processor 8 core and 8 Gig or ram so I would expect
 that
   PG
should cache the whole db into memory. Well actually it doesn't.
   
What is more strange that a query that under zero load is running
 under
100ms during high load times it can take up to 15 seconds !!
What on earth can make such difference ?
   
here are the key config options that I set up :
# - Memory -
   
shared_buffers = 17 # min 16 or
max_connections*2, 8KB each
temp_buffers = 21000# min 100, 8KB each
#max_prepared_transactions = 5  # can be 0 or more
# note: increasing max_prepared_transactions costs ~600 bytes of
 shared
memory
# per transaction slot, plus lock space (see
 max_locks_per_transaction).
work_mem = 1048576  # min 64, size in KB
maintenance_work_mem = 1048576  # min 1024, size in KB
  
   1GB of work_mem is very high if you have more than a couple of
   queries that use it.
  
   Ken
  
#max_stack_depth = 2048 # min 100, size in KB
   
# - Free Space Map -
   
max_fsm_pages = 524298  # min max_fsm_relations*16, 6
   bytes
each
max_fsm_relations = 32768   # min 100, ~70 bytes each
   
# - Kernel Resource Usage -
   
max_files_per_process = 4000# min 25
#preload_libraries = ''
   
any ideas ?
   
cheers,
Peter
  



Re: [PERFORM] Strange performance response for high load times

2009-06-18 Thread Kevin Grittner
Peter Alban peter.alb...@gmail.com wrote: 
 
 Why is there such a big difference ?
 
 i.e. off peek times a simple select with where (on indexed column)
 and limit taks* 40 ms* during peek times it took *2 seconds*  - 50
 times slower !
 
If your high work_mem setting you may have been causing the OS to
discard cached data, causing disk reads where you normally get cache
hits, or even triggered swapping.  Either of those can easily cause a
difference of that magnitude, or more.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Scan taking long time

2009-06-18 Thread Bryce Ewing

Hi Tom,

We have managed to improve significantly on the speed of this query.  
The way that we did this was through clustering the table based on the 
domain index which significantly reduced the page reads that were 
required in order to perform the query.


Also to find this we turned on log_statement_stats to see what it was doing.

This was on a table of roughly 600MB where the domains were randomly 
dispersed.


Cheers
Bryce

Tom Lane wrote:

Bryce Ewing br...@smx.co.nz writes:
  
So it seems to me that once the index is in memory everything is fine 
with the world, but the loading of the index into memory is horrendous.



So it would seem.  What's the disk hardware on this machine?

It's possible that part of the problem is table bloat, leading to the
indexscan having to fetch many more pages than it would if the table
were more compact.

regards, tom lane

  



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] select max() much slower than select min()

2009-06-18 Thread Brian Cox
ts_stats_transet_user_interval has ~48M rows. ts_id is the PK and there 
is an index on ts_interval_start_time. I reindexed it and ran vacuum 
analyze. Only SELECTs have been done since these operations.


cemdb=# explain select min(ts_id) from ts_stats_transet_user_interval a 
where 0=0 and a.ts_interval_start_time = '2009-6-16 01:00' and 
a.ts_interval_start_time  '2009-6-16 02:00';


  QUERY PLAN

 Result  (cost=12.19..12.20 rows=1 width=0)
   InitPlan
 -  Limit  (cost=0.00..12.19 rows=1 width=8)
   -  Index Scan using ts_stats_transet_user_interval_pkey on 
ts_stats_transet_user_interval a  (cost=0.00..5496152.30 rows=450799 
width=8)
 Filter: ((ts_id IS NOT NULL) AND 
(ts_interval_start_time = '2009-06-16 01:00:00-07'::timestamp with time 
zone) AND (ts_interval_start_time  '2009-06-16 02:00:00-07'::timestamp 
with time zone))

(5 rows)
cemdb=# explain select max(ts_id) from ts_stats_transet_user_interval a 
where 0=0 and a.ts_interval_start_time = '2009-6-16 01:00' and 
a.ts_interval_start_time  '2009-6-16 02:00';


  QUERY PLAN

 Result  (cost=12.19..12.20 rows=1 width=0)
   InitPlan
 -  Limit  (cost=0.00..12.19 rows=1 width=8)
   -  Index Scan Backward using 
ts_stats_transet_user_interval_pkey on ts_stats_transet_user_interval a 
 (cost=0.00..5496152.30 rows=450799 width=8)
 Filter: ((ts_id IS NOT NULL) AND 
(ts_interval_start_time = '2009-06-16 01:00:00-07'::timestamp with time 
zone) AND (ts_interval_start_time  '2009-06-16 02:00:00-07'::timestamp 
with time zone))

(5 rows)
[r...@rdl64xeoserv01 log]# time PGPASSWORD=quality psql -U admin -d 
cemdb -c select min(ts_id) from ts_stats_transet_user_interval a where 
a.ts_interval_start_time = '2009-6-16 01:00' and 
a.ts_interval_start_time  '2009-6-16 02:00' min


 603210
(1 row)


real1m32.025s
user0m0.000s
sys 0m0.003s
[r...@rdl64xeoserv01 log]# time PGPASSWORD=quality psql -U admin -d 
cemdb -c select max(ts_id) from ts_stats_transet_user_interval a where 
a.ts_interval_start_time = '2009-6-16 01:00' and 
a.ts_interval_start_time  '2009-6-16 02:00'

max

 603239
(1 row)


real16m39.412s
user0m0.002s
sys 0m0.002s


seems like max() shouldn't take any longer than min() and certainly not 
10 times as long. Any ideas on how to determine the max more quickly?


Thanks,
Brian

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select max() much slower than select min()

2009-06-18 Thread Kevin Grittner
Brian Cox brian@ca.com wrote: 
 
 cemdb=# explain select min(ts_id) from
 ts_stats_transet_user_interval a 
 where 0=0 and a.ts_interval_start_time = '2009-6-16 01:00' and 
 a.ts_interval_start_time  '2009-6-16 02:00';
 
 seems like max() shouldn't take any longer than min() and certainly
 not 10 times as long. Any ideas on how to determine the max more
 quickly?
 
Is there any correlation between ts_id and ts_interval_start_time? 
Perhaps if you tried min and max with different time ranges it would
find a row on a backward scan faster.  It'll take ten times as long if
it has to scan through ten times as many rows to find a match.
 
I don't suppose you have an index on ts_interval_start_time?
If not, what happens if you run these queries after adding one?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select max() much slower than select min()

2009-06-18 Thread Brian Cox

Kevin Grittner [kevin.gritt...@wicourts.gov] wrote:

Is there any correlation between ts_id and ts_interval_start_time?
only vaguely: increasing ts_interval_start_time implies increasing ts_id 
but there may be many rows (100,000's) with the same ts_interval_start_time



Perhaps if you tried min and max with different time ranges it would
find a row on a backward scan faster.  It'll take ten times as long if
it has to scan through ten times as many rows to find a match.

it looks like there are fewer rows backwards than forwards:

cemdb= select count(*) from ts_stats_transet_user_interval where 
ts_interval_start_time  '2009-6-16 01:00';

  count
--
 3210
(1 row)

cemdb= select count(*) from ts_stats_transet_user_interval where 
ts_interval_start_time = '2009-6-16 02:00';

  count
--
 1350
(1 row)



I don't suppose you have an index on ts_interval_start_time?

there is an index. I mentioned this in my orginal posting.

Thanks,
Brian



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select max() much slower than select min()

2009-06-18 Thread Kevin Grittner
Brian Cox brian@ca.com wrote: 
 Kevin Grittner [kevin.gritt...@wicourts.gov] wrote:
 Is there any correlation between ts_id and ts_interval_start_time?
 only vaguely: increasing ts_interval_start_time implies increasing
 ts_id but there may be many rows (100,000's) with the same
 ts_interval_start_time
 
 Perhaps if you tried min and max with different time ranges it
 would find a row on a backward scan faster.  It'll take ten times
 as long if it has to scan through ten times as many rows to find a
 match.
 it looks like there are fewer rows backwards than forwards:
 
Hmmm  I was going to suggest possible bloat near the end of the
table, but the vacuum and reindex should have kept that at from being
a problem.
 
This might be an issue where disks are laid out so that the pages can
be read from start to end quickly; reading backwards might cause a lot
more rotational delay.
 
 I don't suppose you have an index on ts_interval_start_time?
 there is an index. I mentioned this in my orginal posting.
 
Sorry I missed that.  I was afraid that it might not use it because
PostgreSQL doesn't yet recognize correlations between columns.  If it
did, it might determine that the other index was better for this query
(which may or may not be the case).
 
Could you provide the output of VACUUM ANALYZE for these queries, so
we can compare expected to actual?  Also, what is your statistics
target for these (default_statistics_target if you haven't overridden
the specific columns involved)?
 
I guess you could try something like this, too:
 
select max(ts_id) from (select ts_id from
ts_stats_transet_user_interval a 
where 0=0 and a.ts_interval_start_time = '2009-6-16 01:00' and 
a.ts_interval_start_time  '2009-6-16 02:00') x;
 
(Untested, so you might need to fix some typo or oversight.)
 
The EXPLAIN ANALYZE of that might yield interesting information.
 
If that doesn't get you closer to something acceptable, you could
consider a functional index on the inverse of the ts_id column, and
search for the negative of the min of that.  Kinda ugly, but it might
work because the disk would be spinning in the right direction for
you.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance