Re: [PERFORM] performance with query
P.S.: to understand what the query has to make (and 80% of the view hve these to make): a lot of time is spend to pivoting a table with a structure like identifier, description_of_value, numeric value that has to be transformed in identifier, description_1, description_2, ..., description_n where n is not a fixed number (it changes in function of the type of calculation that was used to generate the rows in the table). perhaps this information could help. thanks everybady -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] enum for performance?
On Wed, Jun 17, 2009 at 6:06 PM, Whit Armstrongarmstrong.w...@gmail.com wrote: I have a column which only has six states or values. Is there a size advantage to using an enum for this data type? Currently I have it defined as a character(1). This table has about 600 million rows, so it could wind up making a difference in total size. Here is what enums get you: *) You can skip a join to a detail table if one char is not enough to sufficiently describe the value to the user. *) If you need to order by the whats contained in the enum, the gains can be tremendous because it can be inlined in the index: create table bigtable ( company_id bigint, someval some_enum_t, sometime timestamptz, ); create index bigindex on bigtable(company_id, someval, sometime); select * from bigtable order by 1,2,3 limit 50; -- or select * from bigtable where company_id = 12345 order by 2,3; The disadvantage with enums is flexibility. Sometimes the performance doesn't matter or you need that detail table anyways for other reasons. Also, if you use char vs char(1), you shave a byte and a tiny bit of speed. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] very slow selects on a small table
Tom Lane [...@sss.pgh.pa.us] wrote: Yup. So according to those stats, all ts_id values fall in the range 61 .. 6000250068. It's no wonder it's not expecting to find anything between 0 and 10. I think maybe you forgot to re-analyze after loading data ... although this being 8.3, I'd have expected autovacuum to update the stats at some point ... yes, this is a concern. I may have to do the vacuum analyze myself or learn how to make autovacuum run more frequently. Recommendation: re-ANALYZE, check that the plan changes to something with a higher estimate for the number of rows for this table, and then abort and restart those processes. Lord knows how long you'll be waiting for them to finish with their current plans :-( these queries are still running now 27.5 hours later... These queries are generated by some java code and in putting it into a test program so I could capture the queries, I failed to get the id range correct -- sorry for wasting your time with bogus data. Below is the EXPLAIN output from the 4 correct queries. I can't tell which one is being executed by PID 7397, but the query plans, except the last, do look very similar. In any event, as I mentioned, all 4 are still running. Thanks, Brian cemdb=# explain select * from ts_stats_transetgroup_user_daily a where a.ts_id in (select b.ts_id from ts_stats_transetgroup_user_daily b,ts_stats_transet_user_interval c, ts_transetgroup_transets_map m where b.ts_transet_group_id = m.ts_transet_group_id and m.ts_transet_incarnation_id = c.ts_transet_incarnation_id and c.ts_user_incarnation_id = b.ts_user_incarnation_id and c.ts_interval_start_time = '2009-6-16 01:00' and c.ts_interval_start_time '2009-6-16 02:00') and a.ts_id 61 and a.ts_id 62 order by a.ts_id; QUERY PLAN Sort (cost=138722.75..138734.37 rows=9299 width=779) Sort Key: a.ts_id - Hash IN Join (cost=131710.94..138416.28 rows=9299 width=779) Hash Cond: (a.ts_id = b.ts_id) - Index Scan using ts_stats_transetgroup_user_daily_pkey on ts_stats_transetgroup_user_daily a (cost=0.00..6602.21 rows=9299 width=779) Index Cond: ((ts_id 61::bigint) AND (ts_id 62::bigint)) - Hash (cost=130113.34..130113.34 rows=255616 width=8) - Hash Join (cost=82370.45..130113.34 rows=255616 width=8) Hash Cond: ((m.ts_transet_group_id = b.ts_transet_group_id) AND (c.ts_user_incarnation_id = b.ts_user_incarnation_id)) - Hash Join (cost=3.32..29255.47 rows=229502 width=16) Hash Cond: (c.ts_transet_incarnation_id = m.ts_transet_incarnation_id) - Index Scan using ts_stats_transet_user_interval_starttime on ts_stats_transet_user_interval c (cost=0.00..27674.33 rows=229502 width=16) Index Cond: ((ts_interval_start_time = '2009-06-16 01:00:00-07'::timestamp with time zone) AND (ts_interval_start_time '2009-06-16 02:00:00-07'::timestamp with time zone)) - Hash (cost=2.58..2.58 rows=117 width=16) - Seq Scan on ts_transetgroup_transets_map m (cost=0.00..2.58 rows=117 width=16) - Hash (cost=80511.26..80511.26 rows=247451 width=24) - Seq Scan on ts_stats_transetgroup_user_daily b (cost=0.00..80511.26 rows=247451 width=24) (17 rows) cemdb=# explain select * from ts_stats_transetgroup_user_daily a where a.ts_id in (select b.ts_id from ts_stats_transetgroup_user_daily b,ts_stats_transet_user_interval c, ts_transetgroup_transets_map m where b.ts_transet_group_id = m.ts_transet_group_id and m.ts_transet_incarnation_id = c.ts_transet_incarnation_id and c.ts_user_incarnation_id = b.ts_user_incarnation_id and c.ts_interval_start_time = '2009-6-16 01:00' and c.ts_interval_start_time '2009-6-16 02:00') and a.ts_id 62 and a.ts_id 63 order by a.ts_id; QUERY PLAN Sort (cost=139430.64..139443.43 rows=10237 width=779) Sort Key: a.ts_id - Hash IN Join (cost=131710.94..139089.71 rows=10237 width=779) Hash Cond: (a.ts_id = b.ts_id) - Index Scan using ts_stats_transetgroup_user_daily_pkey on ts_stats_transetgroup_user_daily a (cost=0.00..7265.23 rows=10237 width=779) Index Cond: ((ts_id 62::bigint) AND (ts_id
Re: [PERFORM] very slow selects on a small table
On Thu, Jun 18, 2009 at 6:06 PM, Brian Coxbrian@ca.com wrote: these queries are still running now 27.5 hours later... These queries are generated by some java code and in putting it into a test program so I could capture the queries, I failed to get the id range correct -- sorry for wasting your time with bogus data. Below is the EXPLAIN output from the 4 correct queries. I can't tell which one is being executed by PID 7397, but the query plans, except the last, do look very similar. In any event, as I mentioned, all 4 are still running. this might be quite bogus question, just a hit - but what is your work_mem set to ? Guys, isn't postgresql giving hudge cost, when it can't sort in memory ? -- GJ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] very slow selects on a small table
Grzegorz Jakiewicz [gryz...@gmail.com] wrote: this might be quite bogus question, just a hit - but what is your work_mem set to ? Guys, isn't postgresql giving hudge cost, when it can't sort in memory ? work_mem = 64MB -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] very slow selects on a small table
On Thu, Jun 18, 2009 at 6:16 PM, Brian Cox brian@ca.com wrote: Grzegorz Jakiewicz [gryz...@gmail.com] wrote: this might be quite bogus question, just a hit - but what is your work_mem set to ? Guys, isn't postgresql giving hudge cost, when it can't sort in memory ? work_mem = 64MB try increasing it please, to say 256MB 8.4 in explain analyze actually informs you whether sorting was done on disc or in memory, but you probably don't run stuff on cutting edge ;) -- GJ
Re: [PERFORM] very slow selects on a small table
Brian Cox brian@ca.com writes: these queries are still running now 27.5 hours later... These queries are generated by some java code and in putting it into a test program so I could capture the queries, I failed to get the id range correct -- sorry for wasting your time with bogus data. Below is the EXPLAIN output from the 4 correct queries. I can't tell which one is being executed by PID 7397, but the query plans, except the last, do look very similar. In any event, as I mentioned, all 4 are still running. Strange as can be. Can you do an EXPLAIN ANALYZE on just the IN's sub-select and confirm that the runtime of that is reasonable? I'd be interested to know how many rows it really returns, too. One thing that strikes me is that the cost estimates seem a bit on the low side for the rowcounts. Are you using nondefault planner cost parameters, and if so what? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Strange performance response for high load times
Hi All, We are having a reasonably powerful machine for supporting about 20 databases but in total they're not more then 4GB in size. The machine is 2 processor 8 core and 8 Gig or ram so I would expect that PG should cache the whole db into memory. Well actually it doesn't. What is more strange that a query that under zero load is running under 100ms during high load times it can take up to 15 seconds !! What on earth can make such difference ? here are the key config options that I set up : # - Memory - shared_buffers = 17 # min 16 or max_connections*2, 8KB each temp_buffers = 21000# min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 1048576 # min 64, size in KB maintenance_work_mem = 1048576 # min 1024, size in KB #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - max_fsm_pages = 524298 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 32768 # min 100, ~70 bytes each # - Kernel Resource Usage - max_files_per_process = 4000# min 25 #preload_libraries = '' any ideas ? cheers, Peter
Re: [PERFORM] Strange performance response for high load times
On Thu, Jun 18, 2009 at 08:27:02PM +0200, Peter Alban wrote: Hi All, We are having a reasonably powerful machine for supporting about 20 databases but in total they're not more then 4GB in size. The machine is 2 processor 8 core and 8 Gig or ram so I would expect that PG should cache the whole db into memory. Well actually it doesn't. What is more strange that a query that under zero load is running under 100ms during high load times it can take up to 15 seconds !! What on earth can make such difference ? here are the key config options that I set up : # - Memory - shared_buffers = 17 # min 16 or max_connections*2, 8KB each temp_buffers = 21000# min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 1048576 # min 64, size in KB maintenance_work_mem = 1048576 # min 1024, size in KB 1GB of work_mem is very high if you have more than a couple of queries that use it. Ken #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - max_fsm_pages = 524298 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 32768 # min 100, ~70 bytes each # - Kernel Resource Usage - max_files_per_process = 4000# min 25 #preload_libraries = '' any ideas ? cheers, Peter -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
Alberto Dalmaso dalm...@clesius.it wrote: P.S.: to understand what the query has to make (and 80% of the view hve these to make): a lot of time is spend to pivoting a table with a structure like identifier, description_of_value, numeric value that has to be transformed in identifier, description_1, description_2, ..., description_n where n is not a fixed number (it changes in function of the type of calculation that was used to generate the rows in the table). perhaps this information could help. What would help more is the actual query, if that can be shared. It leaves a lot less to the imagination than descriptions of it. There are a couple things which have been requested which would help pin down the reason the optimizer is not getting to a good plan, so that it can be allowed to do a good job. As Tom said, this would be a much more productive focus than casting about for ways to force it to do what you think is the best thing. (Maybe, given the chance, it can come up with a plan which runs in seconds, rather than over the 24 minutes you've gotten.) With all the optimizer options on, and the from_collapse_limit and join_collapse_limit values both set to 100, run an EXPLAIN (no ANALYZE) on your big problem query. Let us know how long the EXPLAIN runs. If it gets any errors, copy and paste all available information. (General descriptions aren't likely to get us very far.) Since EXPLAIN without ANALYZE only *plans* the query, but doesn't run it, it should not take long to do this. If there are any views or custom functions involved, showing those along with the query source would be good. If we get this information, we have a much better chance to find the real problem and get it fixed. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] performance with query
Kevin Grittner kevin.gritt...@wicourts.gov writes: With all the optimizer options on, and the from_collapse_limit and join_collapse_limit values both set to 100, run an EXPLAIN (no ANALYZE) on your big problem query. Let us know how long the EXPLAIN runs. If it gets any errors, copy and paste all available information. (General descriptions aren't likely to get us very far.) Since EXPLAIN without ANALYZE only *plans* the query, but doesn't run it, it should not take long to do this. One issue here is that with the collapse limits cranked up to more than geqo_threshold, he's going to be coping with GEQO's partially-random plan selection; so whatever he reports might or might not be especially reflective of day-to-day results. I'm tempted to ask that he also push up geqo_threshold. It's possible that that *will* send the planning time to the moon; but it would certainly be worth trying, to find out what plan is produced. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Strange performance response for high load times
So Ken , What do you reckon it should be ? What is the rule of thumb here ? cheers, Peter On Thu, Jun 18, 2009 at 8:30 PM, Kenneth Marshall k...@rice.edu wrote: On Thu, Jun 18, 2009 at 08:27:02PM +0200, Peter Alban wrote: Hi All, We are having a reasonably powerful machine for supporting about 20 databases but in total they're not more then 4GB in size. The machine is 2 processor 8 core and 8 Gig or ram so I would expect that PG should cache the whole db into memory. Well actually it doesn't. What is more strange that a query that under zero load is running under 100ms during high load times it can take up to 15 seconds !! What on earth can make such difference ? here are the key config options that I set up : # - Memory - shared_buffers = 17 # min 16 or max_connections*2, 8KB each temp_buffers = 21000# min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 1048576 # min 64, size in KB maintenance_work_mem = 1048576 # min 1024, size in KB 1GB of work_mem is very high if you have more than a couple of queries that use it. Ken #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - max_fsm_pages = 524298 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 32768 # min 100, ~70 bytes each # - Kernel Resource Usage - max_files_per_process = 4000# min 25 #preload_libraries = '' any ideas ? cheers, Peter
Re: [PERFORM] performance with query
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: With all the optimizer options on, and the from_collapse_limit and join_collapse_limit values both set to 100, run an EXPLAIN (no ANALYZE) on your big problem query. Let us know how long the EXPLAIN runs. If it gets any errors, copy and paste all available information. (General descriptions aren't likely to get us very far.) Since EXPLAIN without ANALYZE only *plans* the query, but doesn't run it, it should not take long to do this. One issue here is that with the collapse limits cranked up to more than geqo_threshold, he's going to be coping with GEQO's partially- random plan selection; so whatever he reports might or might not be especially reflective of day-to-day results. I'm tempted to ask that he also push up geqo_threshold. In an earlier post[1] he said that he had geqo turned off. It does pay to be explicit, though; I'd hate to assume it's of if he's been changing things. Alberto, please ensure that you still have geqo off when you run the test I suggested. Also, I see that I didn't explicitly say that you should send the ANALYZE output, but that's what would be helpful. It's possible that that *will* send the planning time to the moon; but it would certainly be worth trying, to find out what plan is produced. Agreed. What plan is produced, and how long that takes. (And whether he gets an out of memory error.) I figured it was best to get a clear answer to those before moving on -Kevin [1] http://archives.postgresql.org/pgsql-performance/2009-06/msg00186.php -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Strange performance response for high load times
On Thu, Jun 18, 2009 at 09:42:47PM +0200, Peter Alban wrote: So Ken , What do you reckon it should be ? What is the rule of thumb here ? cheers, Peter It really depends on your query mix. The key to remember is that multiples (possibly many) of the work_mem value can be allocated in an individual query. You can set it on a per query basis to help manage it use, i.e. up it for only the query that needs it. With our systems, which run smaller number of queries we do use 256MB. I hope that this helps. Regards, Ken On Thu, Jun 18, 2009 at 8:30 PM, Kenneth Marshall k...@rice.edu wrote: On Thu, Jun 18, 2009 at 08:27:02PM +0200, Peter Alban wrote: Hi All, We are having a reasonably powerful machine for supporting about 20 databases but in total they're not more then 4GB in size. The machine is 2 processor 8 core and 8 Gig or ram so I would expect that PG should cache the whole db into memory. Well actually it doesn't. What is more strange that a query that under zero load is running under 100ms during high load times it can take up to 15 seconds !! What on earth can make such difference ? here are the key config options that I set up : # - Memory - shared_buffers = 17 # min 16 or max_connections*2, 8KB each temp_buffers = 21000# min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 1048576 # min 64, size in KB maintenance_work_mem = 1048576 # min 1024, size in KB 1GB of work_mem is very high if you have more than a couple of queries that use it. Ken #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - max_fsm_pages = 524298 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 32768 # min 100, ~70 bytes each # - Kernel Resource Usage - max_files_per_process = 4000# min 25 #preload_libraries = '' any ideas ? cheers, Peter -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Strange performance response for high load times
What's still badgering me , is the performance when, there is no load or significantly lower than peek times ? Why is there such a big difference ? i.e. off peek times a simple select with where (on indexed column) and limit taks* 40 ms* during peek times it took *2 seconds* - 50 times slower ! cheers, Peter On Thu, Jun 18, 2009 at 10:01 PM, Kenneth Marshall k...@rice.edu wrote: On Thu, Jun 18, 2009 at 09:42:47PM +0200, Peter Alban wrote: So Ken , What do you reckon it should be ? What is the rule of thumb here ? cheers, Peter It really depends on your query mix. The key to remember is that multiples (possibly many) of the work_mem value can be allocated in an individual query. You can set it on a per query basis to help manage it use, i.e. up it for only the query that needs it. With our systems, which run smaller number of queries we do use 256MB. I hope that this helps. Regards, Ken On Thu, Jun 18, 2009 at 8:30 PM, Kenneth Marshall k...@rice.edu wrote: On Thu, Jun 18, 2009 at 08:27:02PM +0200, Peter Alban wrote: Hi All, We are having a reasonably powerful machine for supporting about 20 databases but in total they're not more then 4GB in size. The machine is 2 processor 8 core and 8 Gig or ram so I would expect that PG should cache the whole db into memory. Well actually it doesn't. What is more strange that a query that under zero load is running under 100ms during high load times it can take up to 15 seconds !! What on earth can make such difference ? here are the key config options that I set up : # - Memory - shared_buffers = 17 # min 16 or max_connections*2, 8KB each temp_buffers = 21000# min 100, 8KB each #max_prepared_transactions = 5 # can be 0 or more # note: increasing max_prepared_transactions costs ~600 bytes of shared memory # per transaction slot, plus lock space (see max_locks_per_transaction). work_mem = 1048576 # min 64, size in KB maintenance_work_mem = 1048576 # min 1024, size in KB 1GB of work_mem is very high if you have more than a couple of queries that use it. Ken #max_stack_depth = 2048 # min 100, size in KB # - Free Space Map - max_fsm_pages = 524298 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 32768 # min 100, ~70 bytes each # - Kernel Resource Usage - max_files_per_process = 4000# min 25 #preload_libraries = '' any ideas ? cheers, Peter
Re: [PERFORM] Strange performance response for high load times
Peter Alban peter.alb...@gmail.com wrote: Why is there such a big difference ? i.e. off peek times a simple select with where (on indexed column) and limit taks* 40 ms* during peek times it took *2 seconds* - 50 times slower ! If your high work_mem setting you may have been causing the OS to discard cached data, causing disk reads where you normally get cache hits, or even triggered swapping. Either of those can easily cause a difference of that magnitude, or more. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index Scan taking long time
Hi Tom, We have managed to improve significantly on the speed of this query. The way that we did this was through clustering the table based on the domain index which significantly reduced the page reads that were required in order to perform the query. Also to find this we turned on log_statement_stats to see what it was doing. This was on a table of roughly 600MB where the domains were randomly dispersed. Cheers Bryce Tom Lane wrote: Bryce Ewing br...@smx.co.nz writes: So it seems to me that once the index is in memory everything is fine with the world, but the loading of the index into memory is horrendous. So it would seem. What's the disk hardware on this machine? It's possible that part of the problem is table bloat, leading to the indexscan having to fetch many more pages than it would if the table were more compact. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] select max() much slower than select min()
ts_stats_transet_user_interval has ~48M rows. ts_id is the PK and there is an index on ts_interval_start_time. I reindexed it and ran vacuum analyze. Only SELECTs have been done since these operations. cemdb=# explain select min(ts_id) from ts_stats_transet_user_interval a where 0=0 and a.ts_interval_start_time = '2009-6-16 01:00' and a.ts_interval_start_time '2009-6-16 02:00'; QUERY PLAN Result (cost=12.19..12.20 rows=1 width=0) InitPlan - Limit (cost=0.00..12.19 rows=1 width=8) - Index Scan using ts_stats_transet_user_interval_pkey on ts_stats_transet_user_interval a (cost=0.00..5496152.30 rows=450799 width=8) Filter: ((ts_id IS NOT NULL) AND (ts_interval_start_time = '2009-06-16 01:00:00-07'::timestamp with time zone) AND (ts_interval_start_time '2009-06-16 02:00:00-07'::timestamp with time zone)) (5 rows) cemdb=# explain select max(ts_id) from ts_stats_transet_user_interval a where 0=0 and a.ts_interval_start_time = '2009-6-16 01:00' and a.ts_interval_start_time '2009-6-16 02:00'; QUERY PLAN Result (cost=12.19..12.20 rows=1 width=0) InitPlan - Limit (cost=0.00..12.19 rows=1 width=8) - Index Scan Backward using ts_stats_transet_user_interval_pkey on ts_stats_transet_user_interval a (cost=0.00..5496152.30 rows=450799 width=8) Filter: ((ts_id IS NOT NULL) AND (ts_interval_start_time = '2009-06-16 01:00:00-07'::timestamp with time zone) AND (ts_interval_start_time '2009-06-16 02:00:00-07'::timestamp with time zone)) (5 rows) [r...@rdl64xeoserv01 log]# time PGPASSWORD=quality psql -U admin -d cemdb -c select min(ts_id) from ts_stats_transet_user_interval a where a.ts_interval_start_time = '2009-6-16 01:00' and a.ts_interval_start_time '2009-6-16 02:00' min 603210 (1 row) real1m32.025s user0m0.000s sys 0m0.003s [r...@rdl64xeoserv01 log]# time PGPASSWORD=quality psql -U admin -d cemdb -c select max(ts_id) from ts_stats_transet_user_interval a where a.ts_interval_start_time = '2009-6-16 01:00' and a.ts_interval_start_time '2009-6-16 02:00' max 603239 (1 row) real16m39.412s user0m0.002s sys 0m0.002s seems like max() shouldn't take any longer than min() and certainly not 10 times as long. Any ideas on how to determine the max more quickly? Thanks, Brian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select max() much slower than select min()
Brian Cox brian@ca.com wrote: cemdb=# explain select min(ts_id) from ts_stats_transet_user_interval a where 0=0 and a.ts_interval_start_time = '2009-6-16 01:00' and a.ts_interval_start_time '2009-6-16 02:00'; seems like max() shouldn't take any longer than min() and certainly not 10 times as long. Any ideas on how to determine the max more quickly? Is there any correlation between ts_id and ts_interval_start_time? Perhaps if you tried min and max with different time ranges it would find a row on a backward scan faster. It'll take ten times as long if it has to scan through ten times as many rows to find a match. I don't suppose you have an index on ts_interval_start_time? If not, what happens if you run these queries after adding one? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select max() much slower than select min()
Kevin Grittner [kevin.gritt...@wicourts.gov] wrote: Is there any correlation between ts_id and ts_interval_start_time? only vaguely: increasing ts_interval_start_time implies increasing ts_id but there may be many rows (100,000's) with the same ts_interval_start_time Perhaps if you tried min and max with different time ranges it would find a row on a backward scan faster. It'll take ten times as long if it has to scan through ten times as many rows to find a match. it looks like there are fewer rows backwards than forwards: cemdb= select count(*) from ts_stats_transet_user_interval where ts_interval_start_time '2009-6-16 01:00'; count -- 3210 (1 row) cemdb= select count(*) from ts_stats_transet_user_interval where ts_interval_start_time = '2009-6-16 02:00'; count -- 1350 (1 row) I don't suppose you have an index on ts_interval_start_time? there is an index. I mentioned this in my orginal posting. Thanks, Brian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select max() much slower than select min()
Brian Cox brian@ca.com wrote: Kevin Grittner [kevin.gritt...@wicourts.gov] wrote: Is there any correlation between ts_id and ts_interval_start_time? only vaguely: increasing ts_interval_start_time implies increasing ts_id but there may be many rows (100,000's) with the same ts_interval_start_time Perhaps if you tried min and max with different time ranges it would find a row on a backward scan faster. It'll take ten times as long if it has to scan through ten times as many rows to find a match. it looks like there are fewer rows backwards than forwards: Hmmm I was going to suggest possible bloat near the end of the table, but the vacuum and reindex should have kept that at from being a problem. This might be an issue where disks are laid out so that the pages can be read from start to end quickly; reading backwards might cause a lot more rotational delay. I don't suppose you have an index on ts_interval_start_time? there is an index. I mentioned this in my orginal posting. Sorry I missed that. I was afraid that it might not use it because PostgreSQL doesn't yet recognize correlations between columns. If it did, it might determine that the other index was better for this query (which may or may not be the case). Could you provide the output of VACUUM ANALYZE for these queries, so we can compare expected to actual? Also, what is your statistics target for these (default_statistics_target if you haven't overridden the specific columns involved)? I guess you could try something like this, too: select max(ts_id) from (select ts_id from ts_stats_transet_user_interval a where 0=0 and a.ts_interval_start_time = '2009-6-16 01:00' and a.ts_interval_start_time '2009-6-16 02:00') x; (Untested, so you might need to fix some typo or oversight.) The EXPLAIN ANALYZE of that might yield interesting information. If that doesn't get you closer to something acceptable, you could consider a functional index on the inverse of the ts_id column, and search for the negative of the min of that. Kinda ugly, but it might work because the disk would be spinning in the right direction for you. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance