Re: [PERFORM] Using Between
The question is how can we make it faster. If there's just one region ID for any given postal code, you might try adding a column to vehicleused and storing the postal codes there. You could possibly populate that column using a trigger; probably it doesn't change unless the postalcode changes. Then you could index that column and query against it directly, rather than joining to PostalCodeRegionCountyCity. Short of that, I don't see any obvious way to avoid reading most of the vehicleused table. There may or may not be an index that can speed that up slightly and of course you can always throw hardware at the problem, but fundamentally reading half a million or more rows isn't going to be instantaneous. Incidentally, it would probably simplify things to store postal codes in the same case throughout the system. If you can avoid the need to write lower(x) = lower(y) and just write x = y you may get better plans. I'm not sure that's the case in this particular example but it's something to think about. Something else you might test is bumping the read-ahead value. Most linux installs have this at 256, might try bumping the value to ~8Meg and tune from there . this may help you slightly for seq scan performance. As always: YMMV. It's not going to magically fix low performing I/O subsystems and it won't help many applications of PG but there are a few outlying instances where this change can help a little bit. I am sure someone will step in and tell you it is a bad idea - AND they will probably have perfectly valid reasons for why it is, so you will need to consider the ramifications.. if at all possible test and tune to see. ..: Mark -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Useless sort by
I can't tell if you meant for this to be insulting or my reading it that way is wrong, but it certainly wasn't put in a helpful tone. Let me summarize for you. You've been told that putting ORDER BY into a view is a generally poor idea anyway, that it's better to find ways avoid this class of concern altogether. There are significant non-obvious technical challenges behind actually implementing the behavior you'd like to see; the concerns raised by Tom and Maciek make your idea impractical even if it were desired. And for every person like yourself who'd see the benefit you're looking for, there are far more that would find a change in this area a major problem. The concerns around breakage due to assumed but not required aspects of the relational model are the ones the users of the software will be confused by, not the developers of it. You have the classification wrong; the feedback you've gotten here is from the developers being user oriented, not theory oriented or c! ode oriented. -- Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services and Support www.2ndQuadrant.us Author, PostgreSQL 9.0 High PerformancePre-ordering at: https://www.packtpub.com/postgresql-9-0-high-performance/book Not insulting, just amused bemusement. PG portrays itself as the best OS database, which it may well be. But it does so by stressing the row-by-agonizing-row approach to data. In other words, as just a record paradigm filestore for COBOL/java/C coders. I was expecting more Relational oomph. As Dr. Codd says: A Relational Model of Data for Large Shared Data Banks. Less code, more data. robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] how to enforce index sub-select over filter+seqscan
Hello! I have this table: create table test ( s1 varchar(255), s2 varchar(255), i1 integer, i2 integer, ... over 100 other fields ); table contains over 8 million records there's these indexes: create index is1 on test (s1); create index is2 on test (s2); create index ii1 on test (i1); create index ii2 on test (i2); create index ii3 on test (i1, i2); and then i run this query: select * from ( select * from test where is1 = 'aa' or is2 = 'aa' ) where is1 = 1 or (is1 = 1 and is2 = 1) or (is1 = 2 and is2 = 2) or (is1 = 3 and is2 = 3) where part of outer query can have different count of or (is1 = N and is2 = M) expressions, lets name this number X. When X is low planner chooses index scan using is1 and is2, then BitmapAnd that with index scan using ii1, ii2 or ii3. But when X is big enough ( 15) planner chooses seqscan and filter on i1, i2, s1, s2. Seqscan is very slow and I want to avoid it. Subquery is very fast and i don't know why postgres chooses that plan. I know I can set enable_seqscan = off. Is there other ways to enforce index usage? postgres pg_class have right estimate of rowcount. -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to enforce index sub-select over filter+seqscan
Dmitry Teslenko dtesle...@gmail.com wrote: Seqscan is very slow and I want to avoid it. Subquery is very fast and i don't know why postgres chooses that plan. I know I can set enable_seqscan = off. Is there other ways to enforce index usage? If you come at it from that angle, you probably won't get the best resolution. PostgreSQL can see the alternative plans, and develops estimated costs of running each. It uses the one that it thinks will be fastest. If it's wrong, there's probably something wrong with the statistics it uses for estimating, or with the costing information. (There are some cases where it's not able to accurately estimate costs even if these are right, but let's check the more common cases first.) Please provide a little more information, like PostgreSQL version, the postgresql.conf contents (excluding comments), OS, hardware, and the EXPLAIN ANALYZE output of the query with and without enable_seqscan = off. Other useful ideas here: http://wiki.postgresql.org/wiki/SlowQueryQuestions -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Useless sort by
On Thu, Sep 23, 2010 at 7:51 AM, gnuo...@rcn.com wrote: Not insulting, just amused bemusement. PG portrays itself as the best OS database, which it may well be. But it does so by stressing the row-by-agonizing-row approach to data. In other words, as just a record paradigm filestore for COBOL/java/C coders. I was expecting more Relational oomph. As Dr. Codd says: A Relational Model of Data for Large Shared Data Banks. Less code, more data. So what, exactly, would give pgsql more relationally oomph? Your assertion feels pretty hand wavy right now. -- To understand recursion, one must first understand recursion. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how to enforce index sub-select over filter+seqscan
On Thu, Sep 23, 2010 at 10:26 AM, Dmitry Teslenko dtesle...@gmail.com wrote: Hello! I have this table: create table test ( s1 varchar(255), s2 varchar(255), i1 integer, i2 integer, ... over 100 other fields ); table contains over 8 million records there's these indexes: create index is1 on test (s1); create index is2 on test (s2); create index ii1 on test (i1); create index ii2 on test (i2); create index ii3 on test (i1, i2); and then i run this query: select * from ( select * from test where is1 = 'aa' or is2 = 'aa' ) where is1 = 1 or (is1 = 1 and is2 = 1) or (is1 = 2 and is2 = 2) or (is1 = 3 and is2 = 3) hm, I think you meant to say: s1 = 'aa' or s2 = 'aa', i1 = 1 ... etc. details are important! Consider taking the combination of 'correct' pair of i1 and i2 and building a table with 'values' and joining to that: select * from test join ( values (2,2), (3,3), ... ) q(i1, i2) using(i1,i2) where s1 = 'aa' or s2 = 'aa' or i1=1 merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] locking issue on simple selects?
On 15 September 2010 12:05, Tobias Brox tobi...@gmail.com wrote: Recently we've frequently encountered issues where some simple selects (meaning, selects doing an index lookup and fetching one row) have become stuck for several minutes. Apparently all requests on one exact table gets stuck, all requests not related to said table are going through without any problems. Now I've set up all kind of logging regarding locks, so it seems like we're having issues that aren't lock-related. I just did a bit of research into one situation today. All while having this problem, there was one heavy query running in parallell ... not sure if that's relevant. Then comes one query that requires a seq scan on the problem table (that won't happen again - I just added a new index). Four seconds later comes another query requiring a simple index lookup. Still more queries comes in, most of them simple index lookups, but on different indexes. After one minute there are 25 queries in the pg_stat_activity view towards this table. It's not a particularly huge table. Moments later all 25 queries have been executed. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] locking issue on simple selects?
Tobias Brox tobi...@gmail.com wrote: All while having this problem, there was one heavy query running in parallell ... not sure if that's relevant. Have you turned on checkpoint logging? You might want to see if these are happening at some particular point in the checkpoint processing. If so, look through the archives for posts from Greg Smith on how to tune that -- he's worked out a nice methodology to iteratively improve your configuration in this regard. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Memory usage - indexes
We've come to a tipping point with one of our database servers, it's generally quite loaded but up until recently it was handling the load well - but now we're seeing that it struggles to process all the selects fast enough. Sometimes we're observing some weird lock-like behaviour (see my other post on that), but most of the time the database server is just not capable of handling the load fast enough (causing the queries to pile up in the pg_stat_activity-view). My main hypothesis is that all the important indexes would fit snuggly into the memory before, and now they don't. We'll eventually get the server moved over to new and improved hardware, but while waiting for that to happen we need to do focus on reducing the memory footprint of the database. I have some general questions now ... 1) Are there any good ways to verify my hypothesis? Some months ago I thought of running some small memory-gobbling program on the database server just to see how much memory I could remove before we would see indications of the database being overloaded. It seems a bit radical, but I think the information learned from such an experiment would be very useful ... and we never managed to set up any testing environment that faithfully replicates production traffic. Anyway, it's sort of too late now that we're already observing performance problems even without the memory gobbling script running. 2) I've seen it discussed earlier on this list ... shared_buffers vs OS caches. Some claims that it has very little effect to adjust the size of the shared buffers. Anyway, isn't it a risk that memory is wasted because important data is stored both in the OS cache and the shared buffers? What would happen if using almost all the available memory for shared buffers? Or turn it down to a bare minimum and let the OS do almost all the cache handling? 3) We're discussing to drop some overlapping indexes ... i.e. to drop one out of two indexes looking like this: some_table(a) some_table(a,b) Would the query select * from some_table where a=? run slower if we drop the first index? Significantly? (in our situation I found that the number of distinct b's for each a is low and that the usage stats on the second index is quite low compared with the first one, so I think we'll drop the second index). 4) We're discussing to drop other indexes. Does it make sense at all as long as we're not experiencing problems with inserts/updates? I suppose that if the index isn't used it will remain on disk and won't affect the memory usage ... but what if the index is rarely used ... wouldn't it be better to do a seqscan on a table that is frequently accessed and mostly in memory than to consult an index that is stored on the disk? Sorry for all the stupid questions ;-) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Memory usage - indexes
On 24/09/10 09:50, Tobias Brox wrote: We've come to a tipping point with one of our database servers, it's generally quite loaded but up until recently it was handling the load well - but now we're seeing that it struggles to process all the selects fast enough. Sometimes we're observing some weird lock-like behaviour (see my other post on that), but most of the time the database server is just not capable of handling the load fast enough (causing the queries to pile up in the pg_stat_activity-view). My main hypothesis is that all the important indexes would fit snuggly into the memory before, and now they don't. We'll eventually get the server moved over to new and improved hardware, but while waiting for that to happen we need to do focus on reducing the memory footprint of the database. I have some general questions now ... 1) Are there any good ways to verify my hypothesis? Some months ago I thought of running some small memory-gobbling program on the database server just to see how much memory I could remove before we would see indications of the database being overloaded. It seems a bit radical, but I think the information learned from such an experiment would be very useful ... and we never managed to set up any testing environment that faithfully replicates production traffic. Anyway, it's sort of too late now that we're already observing performance problems even without the memory gobbling script running. 2) I've seen it discussed earlier on this list ... shared_buffers vs OS caches. Some claims that it has very little effect to adjust the size of the shared buffers. Anyway, isn't it a risk that memory is wasted because important data is stored both in the OS cache and the shared buffers? What would happen if using almost all the available memory for shared buffers? Or turn it down to a bare minimum and let the OS do almost all the cache handling? 3) We're discussing to drop some overlapping indexes ... i.e. to drop one out of two indexes looking like this: some_table(a) some_table(a,b) Would the query select * from some_table where a=? run slower if we drop the first index? Significantly? (in our situation I found that the number of distinct b's for each a is low and that the usage stats on the second index is quite low compared with the first one, so I think we'll drop the second index). 4) We're discussing to drop other indexes. Does it make sense at all as long as we're not experiencing problems with inserts/updates? I suppose that if the index isn't used it will remain on disk and won't affect the memory usage ... but what if the index is rarely used ... wouldn't it be better to do a seqscan on a table that is frequently accessed and mostly in memory than to consult an index that is stored on the disk? Sorry for all the stupid questions ;-) All good questions! Before (or maybe as well as) looking at index sizes vs memory I'd check to see if any of your commonly run queries have suddenly started to use different plans due to data growth, e.g: - index scan to seq scan (perhaps because effective_cache_size is too small now) - hash agg to sort (work_mem too small now) We had a case of the 1st point happen here a while ago, symptoms looked very like what you are describing. Re index size, you could try indexes like: some_table(a) some_table(b) which may occupy less space, and the optimizer can bitmap and/or them to work like the compound index some_table(a,b). regards Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] locking issue on simple selects?
On 23 September 2010 22:55, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Have you turned on checkpoint logging? Yes ... it seems so: 13:19:13.840 - LOG: checkpoint complete: wrote 3849 buffers (0.2%); 0 transaction log file(s) added, 0 removed, 5 recycled; write=269.551 s, sync=0.103 s, total=269.953 s 13:19:13.841 - LOG: checkpoint starting: xlog 13:19:33 - the seq scan query towards the affected table started 13:20:31.454 - one of the index lookup queries towards the affected table was finished 13:20:43.176 - LOG: checkpoint complete: wrote 108199 buffers (6.9%); 0 transaction log file(s) added, 0 removed, 16 recycled; write=11.521 s, sync=77.533 s, total=89.335 s You might want to see if these are happening at some particular point in the checkpoint processing. If so, look through the archives for posts from Greg Smith on how to tune that -- he's worked out a nice methodology to iteratively improve your configuration in this regard. Thank you, I will ... hmm ... I found this blog post: http://blog.2ndquadrant.com/en/2010/01/measuring-postgresql-checkpoin.html Of course I'm doing it my own way: select *,now() as snapshot into tmp_pg_stat_bgwriter from pg_stat_bgwriter ; create view tmp_delta_pg_stat_bgwriter as select a.checkpoints_timed-b.checkpoints_timed as checkpoints_timed,a.checkpoints_req-b.checkpoints_req as checkpoints_req,a.buffers_checkpoint-b.buffers_checkpoint as buffers_checkpoint,a.buffers_clean-b.buffers_clean as buffers_clean,a.maxwritten_clean-b.maxwritten_clean as maxwritten_clean,a.buffers_backend-b.buffers_backend as buffers_backend,a.buffers_alloc-b.buffers_alloc as buffers_alloc, now()-b.snapshot as interval from pg_stat_bgwriter a , (select * from tmp_pg_stat_bgwriter order by snapshot desc limit 1) as b; Checkpoint timeout is set to 5 minutes. Right now we're having relatively low activity. I'm not sure how to read the stats below, but they look OK to me: select * from tmp_delta_pg_stat_bgwriter ; checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc | interval ---+-++---+--+-+---+- 3 | 0 | 8277 | 15 |0 | 185 | 18691 | 00:12:02.988842 (1 row) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] locking issue on simple selects?
Tobias Brox tobi...@gmail.com wrote: 13:19:13.840 - LOG: checkpoint complete 13:19:13.841 - LOG: checkpoint starting 13:20:43.176 - LOG: checkpoint complete There wasn't a lot of time between the completion of one checkpoint and the start of the next. And the two checkpoints finished a minute and a half apart. Perhaps you need to boost your checkpoint_segments setting? What is it now? -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance