Re: [HACKERS] random_page_cost vs seq_page_cost
On Tue, Feb 07, 2012 at 07:58:28PM -0500, Bruce Momjian wrote: I was initially concerned that tuning advice in this part of the docs would look out of place, but now see the 25% shared_buffers recommentation, and it looks fine, so we are OK. (Should we caution against more than 8GB of shared buffers? I don't see that in the docs.) I agree we are overdue for better a explanation of random page cost, so I agree with your direction. I did a little word-smithing to tighten up your text; feel free to discard what you don't like: Random access to mechanical disk storage is normally much more expensive than four-times sequential access. However, a lower default is used (4.0) because the majority of random accesses to disk, such as indexed reads, are assumed to be in cache. The default value can be thought of as modeling random access as 40 times slower than sequential, while expecting 90% of random reads to be cached. If you believe a 90% cache rate is an incorrect assumption for your workload, you can increase random_page_cost to better reflect the true cost of random storage reads. Correspondingly, if your data is likely to be completely in cache, such as when the database is smaller than the total server memory, decreasing random_page_cost can be appropriate. Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost. Patch applied for random_page_cost docs. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 02/11/2012 07:53 PM, Jeff Janes wrote: Has it ever been well-characterized what the problem is with8GB? I've used shared buffers above that size for testing purposes and could never provoke a problem with it. If anyone ever manages to characterize it well, we might actually make progress on isolating and fixing it. All we have so far are a couple of application level test results suggesting a higher value caused performance to drop. The first public one I remember was from Jignesh; http://archives.postgresql.org/pgsql-performance/2008-02/msg00184.php gives him quoting on where he found the Solaris roll-off was at. What we really need to stomp this one down is someone to find the same thing, then show profiler output in each case. Note that Jignesh's report included significant amount of filesystem level tuning, using things like more direct I/O, and that might be a necessary requirement to run into the exact variant of this limitation he mentioned. I haven't spent a lot of time looking for this problem myself. What I've heard second-hand from more than one person now is a) larger settings than 8GB can be an improvement for some people still, and b) simple benchmarks don't always have this problem. I have noted that the few public and private reports I've gotten all suggest problems show up on benchmarks of more complicated workloads. I think Jignesh mentioned this being obvious in the more complicated TPC-derived benchmarks, not in simple things like pgbench. I may be misquoting him though. And given that one of the possible causes for this was an excess of some lock contention, it's quite possible this one is already gone from 9.2, given the large number of lock related issues that have been squashed so far in this release. All of those disclaimers are why I think no one has pushed to put a note about this in the official docs. Right now the only suggested limit is this one: The useful range for shared_buffers on Windows systems is generally from 64MB to 512MB. The most common practical limit I've run into with large shared_buffers settings hits earlier than 8GB: running into checkpoint spike issues. I have installs that started with shared_buffers in the 4 to 8GB range, where we saw badly spiking I/O at checkpoint sync time. Lowering the databases cache can result in smarter writing decisions withing the OS, improving latency--even though total writes are actually higher if you measure what flows from the database to OS. That side of the latency vs. throughput trade-off existing is one of the main reasons I haven't gone chasing after problems with really large shared_buffers settings. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On Tue, Feb 7, 2012 at 2:06 PM, Greg Smith g...@2ndquadrant.com wrote: On 02/07/2012 03:23 PM, Bruce Momjian wrote: Where did you see that there will be an improvement in the 9.2 documentation? I don't see an improvement. I commented that I'm hoping for an improvement in the documentation of how much timing overhead impacts attempts to measure this area better. That's from the add timing of buffer I/O requests feature submission. I'm not sure if Bene read too much into that or not; I didn't mean to imply that the docs around random_page_cost have gotten better. This particular complaint is extremely common though, seems to pop up on one of the lists a few times each year. Your suggested doc fix is fine as a quick one, but I think it might be worth expanding further on this topic. Something discussing SSDs seems due here too. Here's a first draft of a longer discussion, to be inserted just after where it states the default value is 4.0: True random access to mechanical disk storage will normally be more expensive than this default suggests. The value used is lower to reflect caching effects. Some common random accesses to disk, such as indexed reads, are considered likely to be in cache. The default value can be thought of as modeling random access as 40 times as expensive as sequential, while expecting that 90% of random reads will actually be cached. For these numbers to work out to 4, we must also be assuming that virtually zero of the sequentially read pages are cached. Is that a realistic assumption? If the table is accessed only by seq scans, the ring buffer may prevent it from getting cached (although even then it could very well be the OS cache as that doesn't respect the ring buffer), but it would be pretty common for other parts of the application to access the same table via index scan, and so cause substantial parts of it to be cached. But I can see that that would rapidly get too complicated to discuss in the documentation. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On Tue, Feb 7, 2012 at 4:58 PM, Bruce Momjian br...@momjian.us wrote: I was initially concerned that tuning advice in this part of the docs would look out of place, but now see the 25% shared_buffers recommentation, and it looks fine, so we are OK. (Should we caution against more than 8GB of shared buffers? I don't see that in the docs.) Has it ever been well-characterized what the problem is with 8GB? I've used shared buffers above that size for testing purposes and could never provoke a problem with it. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 07/02/12 19:58, Bruce Momjian wrote: On Tue, Feb 07, 2012 at 05:06:18PM -0500, Greg Smith wrote: On 02/07/2012 03:23 PM, Bruce Momjian wrote: Where did you see that there will be an improvement in the 9.2 documentation? I don't see an improvement. I commented that I'm hoping for an improvement in the documentation of how much timing overhead impacts attempts to measure this area better. That's from the add timing of buffer I/O requests feature submission. I'm not sure if Bene read too much into that or not; I didn't mean to imply that the docs around random_page_cost have gotten better. I guess I did. But I'm very glad that as a side effect Bruce and Greg have improved it ;-) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On Wed, Jan 11, 2012 at 08:26:52AM +, Benedikt Grundmann wrote: (replying just to you) On 10/01/12 15:22, Greg Smith wrote: On 1/5/12 5:04 AM, Benedikt Grundmann wrote: That sort of thing is one reason why all attempts so far to set random_page_cost based on physical characteristics haven't gone anywhere useful. The setting is sort of overloaded right now, it's a fuzzy mix of true random seek cost blended with some notion of cache percentage. Trying to bring some measurements to bear on it is a less effective approach than what people actually do here. Monitor the profile of query execution, change the value, see what happens. Use that as feedback for what direction to keep going; repeat until you're just spinning with no improvements. Thank you very much for the reply it is very interesting. I'm excited to hear that documentation in that area will improve in 9.2. It's interesting postgres has remarkable good documentation but it is a sufficiently complex system that to actually sensible tune the knobs provided you have to understand quite a lot about what is going on. A colleague of mine likes to say all abstractions leak, which seems very appropriate in this case. Where did you see that there will be an improvement in the 9.2 documentation? I don't see an improvement. I looked over the random_page_cost documentation and remembered I was always concerned about how vague it was about caching effects, so I wrote the attached doc patch to explicity state it. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml new file mode 100644 index 3a84321..19e3e36 *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *** SET ENABLE_SEQSCAN TO OFF; *** 2590,2595 --- 2590,2597 para Sets the planner's estimate of the cost of a non-sequentially-fetched disk page. The default is 4.0. + (The default is lower than the typical difference between random + and sequential storage access speed because of caching effects.) This value can be overridden for tables and indexes in a particular tablespace by setting the tablespace parameter of the same name (see xref linkend=sql-altertablespace). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 02/07/2012 03:23 PM, Bruce Momjian wrote: Where did you see that there will be an improvement in the 9.2 documentation? I don't see an improvement. I commented that I'm hoping for an improvement in the documentation of how much timing overhead impacts attempts to measure this area better. That's from the add timing of buffer I/O requests feature submission. I'm not sure if Bene read too much into that or not; I didn't mean to imply that the docs around random_page_cost have gotten better. This particular complaint is extremely common though, seems to pop up on one of the lists a few times each year. Your suggested doc fix is fine as a quick one, but I think it might be worth expanding further on this topic. Something discussing SSDs seems due here too. Here's a first draft of a longer discussion, to be inserted just after where it states the default value is 4.0: True random access to mechanical disk storage will normally be more expensive than this default suggests. The value used is lower to reflect caching effects. Some common random accesses to disk, such as indexed reads, are considered likely to be in cache. The default value can be thought of as modeling random access as 40 times as expensive as sequential, while expecting that 90% of random reads will actually be cached. If you believe a high cache rate is an incorrect assumption for your workload, you might increase random_page_cost to closer reflect the true cost of random reads against your storage. Correspondingly, if your data is likely to be completely cached, such as when the database is smaller than the total memory in the server, decreasing random_page_cost can be appropriate. Storage where the true cost of random reads is low, such as solid-state drives and similar memory-based devices, might also find lower values of random_page_cost better reflect the real-world cost of that operation. === I think of the value as being more like 80 times as expensive and a 95% hit rate, but the above seems more likely to turn into understandable math to a first-time reader of this section. I stopped just short of recommending a value for the completely cached case. I normally use 1.01 there; I know others prefer going fully to 1.0 instead. That argument seems like it could rage on for some time. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On Tue, Feb 07, 2012 at 05:06:18PM -0500, Greg Smith wrote: On 02/07/2012 03:23 PM, Bruce Momjian wrote: Where did you see that there will be an improvement in the 9.2 documentation? I don't see an improvement. I commented that I'm hoping for an improvement in the documentation of how much timing overhead impacts attempts to measure this area better. That's from the add timing of buffer I/O requests feature submission. I'm not sure if Bene read too much into that or not; I didn't mean to imply that the docs around random_page_cost have gotten better. This particular complaint is extremely common though, seems to pop up on one of the lists a few times each year. Your suggested doc fix is fine as a quick one, but I think it might be worth expanding further on this topic. Something discussing SSDs seems due here too. Here's a first draft of a longer discussion, to be inserted just after where it states the default value is 4.0: I was initially concerned that tuning advice in this part of the docs would look out of place, but now see the 25% shared_buffers recommentation, and it looks fine, so we are OK. (Should we caution against more than 8GB of shared buffers? I don't see that in the docs.) I agree we are overdue for better a explanation of random page cost, so I agree with your direction. I did a little word-smithing to tighten up your text; feel free to discard what you don't like: Random access to mechanical disk storage is normally much more expensive than four-times sequential access. However, a lower default is used (4.0) because the majority of random accesses to disk, such as indexed reads, are assumed to be in cache. The default value can be thought of as modeling random access as 40 times slower than sequential, while expecting 90% of random reads to be cached. If you believe a 90% cache rate is an incorrect assumption for your workload, you can increase random_page_cost to better reflect the true cost of random storage reads. Correspondingly, if your data is likely to be completely in cache, such as when the database is smaller than the total server memory, decreasing random_page_cost can be appropriate. Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
(replying just to you) On 10/01/12 15:22, Greg Smith wrote: On 1/5/12 5:04 AM, Benedikt Grundmann wrote: That sort of thing is one reason why all attempts so far to set random_page_cost based on physical characteristics haven't gone anywhere useful. The setting is sort of overloaded right now, it's a fuzzy mix of true random seek cost blended with some notion of cache percentage. Trying to bring some measurements to bear on it is a less effective approach than what people actually do here. Monitor the profile of query execution, change the value, see what happens. Use that as feedback for what direction to keep going; repeat until you're just spinning with no improvements. Thank you very much for the reply it is very interesting. I'm excited to hear that documentation in that area will improve in 9.2. It's interesting postgres has remarkable good documentation but it is a sufficiently complex system that to actually sensible tune the knobs provided you have to understand quite a lot about what is going on. A colleague of mine likes to say all abstractions leak, which seems very appropriate in this case. We are not sure if the database used to choose differently before the move to the new hardware and the hardware is performing worse for random seeks. Or if the planner is now making different choices. I don't recommend ever deploying new hardware without first doing some low-level benchmarks to validate its performance. Once stuff goes into production, you can't do that anymore. See http://www.2ndquadrant.com/en/talks/ for my hardware benchmarking talks if you'd like some ideas on what to collect. We had actually done lots of tests on the sequential read performance. But you are right we could have done better (and I'll definitely read through your talks). Did you see my follow up? Based on the feedback we did further tests and It is now clear that neither the hardware nor the database version are at fault. A different plan is chosen by both new and old database version if spun up on the database as it is right now. Our best guess is that the clusters we run after we had moved to the hardware (it having more diskspace and faster sequential I/O making it possible) changed the planners perception of how the joins will perform in relation to each other. Cheers, Bene -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 11/01/12 08:26, Benedikt Grundmann wrote: (replying just to you) Clearly I didn't. Sigh. Getting myself a coffee now. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 1/5/12 5:04 AM, Benedikt Grundmann wrote: I have a question of how to benchmark hardware to determine the appropriate ratio of seq_page_cost vs random_page_cost. Emails in this mailing lists archive seem to indicate that 1.0 vs 3.0 - 4.0 are appropriate values on modern hardware. Which surprised me a bit as I had thought that on actual harddrives (ignoring SSDs) random_page_cost is higher. I guess that the number tries to reflect caching of the relevant pages in memory and modern hardware you have more of that? That sort of thing is one reason why all attempts so far to set random_page_cost based on physical characteristics haven't gone anywhere useful. The setting is sort of overloaded right now, it's a fuzzy mix of true random seek cost blended with some notion of cache percentage. Trying to bring some measurements to bear on it is a less effective approach than what people actually do here. Monitor the profile of query execution, change the value, see what happens. Use that as feedback for what direction to keep going; repeat until you're just spinning with no improvements. It's easy to measure the actual read times and set the value based on that instead. But that doesn't actually work out so well. There's at least three problems in that area: -Timing information is sometimes very expensive to collect. This I expect to at least document and quantify why usefully as a 9.2 feature. -Basing query execution decisions on what is already in the cache leads to all sorts of nasty feedback situations where you optimize for the short term, for example using an index already in cache, while never reading in what would be a superior long term choice because it seems too expensive. -Making a major adjustment to the query planning model like this would require a large performance regression testing framework to evaluate the results in. We are not sure if the database used to choose differently before the move to the new hardware and the hardware is performing worse for random seeks. Or if the planner is now making different choices. I don't recommend ever deploying new hardware without first doing some low-level benchmarks to validate its performance. Once stuff goes into production, you can't do that anymore. See http://www.2ndquadrant.com/en/talks/ for my hardware benchmarking talks if you'd like some ideas on what to collect. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 07/01/12 23:01, Peter Eisentraut wrote: On tor, 2012-01-05 at 10:04 +, Benedikt Grundmann wrote: We have recently upgrade two of our biggest postgres databases to new hardware and minor version number bump (8.4.5 - 8.4.9). We are experiencing a big performance regression in some queries. In those cases the planner seems to choose a nested loop index scan instead of hashing the index once and then joining. There was a planner regression introduced in version 8.4.8, which was thought to be fixed in 8.4.9. Maybe you got caught by that. See Message-Id: 760c0206-b5f4-4dc6-9296-b7a730b7f...@silentmedia.com for some information. Check if your queries match that pattern. Good idea. But that is not it. We checked by using 8.4.5 on the new hardware (and the new database) which produced the same (bad) plans as 8.4.10 (with both the old and the new postgres config). We are again speculating that it might be: For some of those tables we have also have recently (as part of the move) clustered for the first time in ages and it was speculated that that might have changed statistics (such as correlation) and increased the attractiveness of the index scan to the planner. Is that possible? If so what is the best way to prove / disprove this theory? And ideally if true what knobs are available to tune this? Thanks, Bene -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On tor, 2012-01-05 at 10:04 +, Benedikt Grundmann wrote: We have recently upgrade two of our biggest postgres databases to new hardware and minor version number bump (8.4.5 - 8.4.9). We are experiencing a big performance regression in some queries. In those cases the planner seems to choose a nested loop index scan instead of hashing the index once and then joining. There was a planner regression introduced in version 8.4.8, which was thought to be fixed in 8.4.9. Maybe you got caught by that. See Message-Id: 760c0206-b5f4-4dc6-9296-b7a730b7f...@silentmedia.com for some information. Check if your queries match that pattern. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On Thu, Jan 5, 2012 at 5:04 AM, Benedikt Grundmann bgrundm...@janestreet.com wrote: We are experiencing a big performance regression in some queries. In those cases the planner seems to choose a nested loop index scan instead of hashing the index once and then joining. I think you probably need to post EXPLAIN ANALYZE output from the actual queries to get useful advice, probably to pgsql-performance, rather than here. It's hard to believe that enable_nestloop=off is doing anything other than masking whatever the real problem is, but it's hard to tell what that problem is based on the information provided. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 05/01/12 10:04, Benedikt Grundmann wrote: As a counter measure we are experimenting with enable_nestloop = off random_page_cost = 20 (instead of the previous 4). For what it is worth we had to revert the enable_nestloop = off change. It just moved the pain around by making other queries perform much worse than before. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 2012-01-05 10:04, Benedikt Grundmann wrote: I have a question of how to benchmark hardware to determine the appropriate ratio of seq_page_cost vs random_page_cost. It'd be really nice if the DBMS measured actual experienced values.. -- Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] random_page_cost vs seq_page_cost
On 1/5/12 3:00 PM, Jeremy Harris wrote: On 2012-01-05 10:04, Benedikt Grundmann wrote: I have a question of how to benchmark hardware to determine the appropriate ratio of seq_page_cost vs random_page_cost. It'd be really nice if the DBMS measured actual experienced values.. Certainly it would. It would also require a whole lot of instrumentation. Feel free to write some ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers