Re: [PERFORM] a heavy duty operation on an "unused" table kills my server
On Fri, 15 Jan 2010, Greg Smith wrote: It seems to me that CFQ is simply bandwidth limited by the extra processing it has to perform. I'm curious what you are doing when you see this. 16 disc 15kRPM RAID0, when using fadvise with more than 100 simultaneous 8kB random requests. I sent an email to the mailing list on 29 Jan 2008, but it got thrown away by the mailing list spam filter because it had an image in it (the graph showing interesting information). Gregory Stark replied to it in http://archives.postgresql.org/pgsql-performance/2008-01/msg00285.php I was using his synthetic test case program. My theory has been that the "extra processing it has to perform" you describe just doesn't matter in the context of a fast system where physical I/O is always the bottleneck. Basically, to an extent, that's right. However, when you get 16 drives or more into a system, then it starts being an issue. Matthew -- For every complex problem, there is a solution that is simple, neat, and wrong. -- H. L. Mencken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Change query join order
Thanks You, I changed the random_page_cost to 2 and the query plan has changed and speeds up. I will check the other queries but I think I will leave it at this value. Thank you again. Kaloyan Iliev Robert Haas wrote: On Fri, Jan 8, 2010 at 2:23 PM, Tom Lane wrote: If the other plan does turn out to be faster (and I agree with Tom that there is no guarantee of that), then one thing to check is whether seq_page_cost and random_page_cost are set too high. If the data is all cached, the default values of 4 and 1 are three orders of magnitude too large, and they should also be set to equal rather than unequal values. Tweaking the cost parameters to suit your local situation is the recommended cure for planner misjudgments; but I'd recommend against changing them on the basis of only one example. You could easily find yourself making other cases worse. Get a collection of common queries for your app and look at the overall effects. No argument, and well said -- just trying to point out that the default values really are FAR too high for people with databases that fit in OS cache. ...Robert
Re: [PERFORM] New server to improve performance on our large and busy DB - advice?
yeah, the values are at the end. Sounds like your vacuum settings are too non-aggresive. Generally this is the vacuum cost delay being too high. Of course, I have to ask: what's the down side? Yes! You can run vacuum verbose against the regular old postgres database (or just create one for testing with nothing in it) and you'll still get the fsm usage numbers from that! So, no need to run it against the big db. However, if regular vacuum verbose couldn't finish in a week, then you've likely got vacuum and autovacuum set to be too timid in their operation, and may be getting pretty bloated as we speak. Once the fsm gets too blown out of the water, it's quicker to dump and reload the whole DB than to try and fix it. My client reports this is what they actualyl do on a monthly basis. And the numbers are in: NOTICE: number of page slots needed (4090224) exceeds max_fsm_pages (204800) HINT: Consider increasing the configuration parameter "max_fsm_pages" to a value over 4090224. Gee, only off by a factor of 20. What happens if I go for this number (once again, what's the down side)? Carlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] a heavy duty operation on an "unused" table kills my server
Matthew Wakeling wrote: On Fri, 15 Jan 2010, Greg Smith wrote: My theory has been that the "extra processing it has to perform" you describe just doesn't matter in the context of a fast system where physical I/O is always the bottleneck. Basically, to an extent, that's right. However, when you get 16 drives or more into a system, then it starts being an issue. I guess if I test a system with *only* 16 drives in it one day, maybe I'll find out. Seriously though, there is some difference between a completely synthetic test like you noted issues with here, and anything you can see when running the database. I was commenting more on the state of things from the perspective of a database app, where I just haven't seen any of the CFQ issues I hear reports of in other contexts. I'm sure there are plenty of low-level tests where the differences between the schedulers is completely obvious and it doesn't look as good anymore, and I'll take a look at whether I can replicate the test case you saw a specific concern with here. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server to improve performance on our large and busy DB - advice?
"Carlo Stonebanks" wrote: >> yeah, the values are at the end. Sounds like your vacuum >> settings are too non-aggresive. Generally this is the vacuum >> cost delay being too high. > > Of course, I have to ask: what's the down side? If you make it too aggressive, it could impact throughput or response time. Odds are that the bloat from having it not aggressive enough is currently having a worse impact. >> Once the fsm gets too blown out of the water, it's quicker >> to dump and reload the whole DB than to try and fix it. > > My client reports this is what they actualyl do on a monthly > basis. The probably won't need to do that with proper configuration and vacuum policies. >>> NOTICE: number of page slots needed (4090224) exceeds >>> max_fsm_pages (204800) >>> HINT: Consider increasing the configuration parameter >>> "max_fsm_pages" to a value over 4090224. > > Gee, only off by a factor of 20. What happens if I go for this > number (once again, what's the down side)? It costs six bytes of shared memory per entry. http://www.postgresql.org/docs/8.3/interactive/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-FSM -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ext4 finally doing the right thing
Jeff Davis wrote: On one side, we might finally be able to use regular drives with their caches turned on safely, taking advantage of the cache for other writes while doing the right thing with the database writes. That could be good news. What's your opinion on the practical performance impact? If it doesn't need to be fsync'd, the kernel probably shouldn't have written it to the disk yet anyway, right (I'm assuming here that the OS buffer cache is much larger than the disk write cache)? I know they just tweaked this area recently so this may be a bit out of date, but kernels starting with 2.6.22 allow you to get up to 10% of memory dirty before getting really aggressive about writing things out, with writes starting to go heavily at 5%. So even with a 1GB server, you could easily find 100MB of data sitting in the kernel buffer cache ahead of a database write that needs to hit disc. Once you start considering the case with modern hardware, where even my desktop has 8GB of RAM and most serious servers I see have 32GB, you can easily have gigabytes of such data queued in front of the write that now needs to hit the platter. The dream is that a proper barrier implementation will then shuffle your important write to the front of that queue, without waiting for everything else to clear first. The exact performance impact depends on how many non-database writes happen. But even on a dedicated database disk, it should still help because there are plenty of non-sync'd writes coming out the background writer via its routine work and the checkpoint writes. And the ability to fully utilize the write cache on the individual drives, on commodity hardware, without risking database corruption would make life a lot easier. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [PERFORM] Re: New server to improve performance on our large and busy DB - advice? (v2)
* A database that is of small to medium size (5 - 10 GB)? * Around 10 clients that perform constant write operations to the database (UPDATE/INSERT) * Around 10 clients that occasionally read from the database * Around 6000 tables in your database * A problem with tuning it all * Migration to new hardware and/or OS Is this all correct? Actually, the tablespace is very large, over 500GB. However, the actualy production DB is 200GB. First thing that is noticeable is that you seem to have way too few drives in the server - not because of disk space required but because of speed. You didn't say what type of drives you have and you didn't say what you would consider desirable performance levels, but off hand (because of the "10 clients perform constant writes" part) you will probably want at least 2x-4x more drives. With only 4 drives, RAID 10 is the only thing usable here. What would be the optimum RAID level and number of disks? > 2) Which Windows OS would you recommend? (currently 2008 x64 Server) Would not recommend Windows OS. We may be stuck as my client is only considering Red Hat Linux (still waiting to find out which version). If it turns out that this limitatt doesn't give better than a marginal improvement, then there is no incentive to create more complications in what is basically a Windows shop (although the project manager is a Linux advocate). Most importantly, you didn't say what you would consider desirable performance. The hardware and the setup you described will work, but not necessarily fast enough. Once again, it seems as though we are down to the number of drives... Have you tried decreasing random_page_cost in postgresql.conf? Or setting (as a last resort) enable_seqscan = off? In critical code sections, we do - we have stored procedures and code segments which save the current enable_seqscan value, set it to off (local to the transaction), then restore it after the code has run. Our current "planner cost" values are all default. Is this what you would choose for a Intel Core 2 Quads Quad with 48 GB RAM? # - Planner Cost Constants - #seq_page_cost = 1.0 # measured on an arbitrary scale #random_page_cost = 4.0 # same scale as above #cpu_tuple_cost = 0.01 # same scale as above #cpu_index_tuple_cost = 0.005 # same scale as above #cpu_operator_cost = 0.0025 # same scale as above #effective_cache_size = 128MB Thanks for the help, Carlo -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server to improve performance on our large and busy DB - advice?
On Wed, Jan 20, 2010 at 3:03 PM, Carlo Stonebanks wrote: >> Yes! You can run vacuum verbose against the regular old postgres >> database (or just create one for testing with nothing in it) and >> you'll still get the fsm usage numbers from that! So, no need to run >> it against the big db. However, if regular vacuum verbose couldn't >> finish in a week, then you've likely got vacuum and autovacuum set to >> be too timid in their operation, and may be getting pretty bloated as >> we speak. Once the fsm gets too blown out of the water, it's quicker >> to dump and reload the whole DB than to try and fix it. > > My client reports this is what they actualyl do on a monthly basis. Something is deeply wrong with your client's vacuuming policies. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] ext4 finally doing the right thing
That doesn't sound right. The kernel having 10% of memory dirty doesn't mean there's a queue you have to jump at all. You don't get into any queue until the kernel initiates write-out which will be based on the usage counters -- basically a lru. fsync and cousins like sync_file_range and posix_fadvise(DONT_NEED) in initiate write-out right away. How many pending write-out requests for how much data the kernel should keep active is another question but I imagine it has more to do with storage hardware than how much memory your system has. And for most hardware it's probably on the order of megabytes or less. greg On 20 Jan 2010 21:19, "Greg Smith" wrote: Jeff Davis wrote: > > >> On one side, we might finally be >> able to use regular drives with their ... I know they just tweaked this area recently so this may be a bit out of date, but kernels starting with 2.6.22 allow you to get up to 10% of memory dirty before getting really aggressive about writing things out, with writes starting to go heavily at 5%. So even with a 1GB server, you could easily find 100MB of data sitting in the kernel buffer cache ahead of a database write that needs to hit disc. Once you start considering the case with modern hardware, where even my desktop has 8GB of RAM and most serious servers I see have 32GB, you can easily have gigabytes of such data queued in front of the write that now needs to hit the platter. The dream is that a proper barrier implementation will then shuffle your important write to the front of that queue, without waiting for everything else to clear first. The exact performance impact depends on how many non-database writes happen. But even on a dedicated database disk, it should still help because there are plenty of non-sync'd writes coming out the background writer via its routine work and the checkpoint writes. And the ability to fully utilize the write cache on the individual drives, on commodity hardware, without risking database corruption would make life a lot easier. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [PERFORM] ext4 finally doing the right thing
Greg Stark wrote: That doesn't sound right. The kernel having 10% of memory dirty doesn't mean there's a queue you have to jump at all. You don't get into any queue until the kernel initiates write-out which will be based on the usage counters -- basically a lru. fsync and cousins like sync_file_range and posix_fadvise(DONT_NEED) in initiate write-out right away. Most safe ways ext3 knows how to initiate a write-out on something that must go (because it's gotten an fsync on data there) requires flushing every outstanding write to that filesystem along with it. So as soon as a single WAL write shows up, bam! The whole cache is emptied (or at least everything associated with that filesystem), and the caller who asked for that little write is stuck waiting for everything to clear before their fsync returns success. This particular issue absolutely killed Firefox when they switched to using SQLite not too long ago; high-level discussion at http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/ and confirmation/discussion of the issue on lkml at https://kerneltrap.org/mailarchive/linux-fsdevel/2008/5/26/1941354 . Note the comment from the first article saying "those delays can be 30 seconds or more". On multiple occasions, I've measured systems with dozens of disks in a high-performance RAID1+0 with battery-backed controller that could grind to a halt for 10, 20, or more seconds in this situation, when running pgbench on a big database. As was the case on the latest one I saw, if you've got 32GB of RAM and have let 3.2GB of random I/O from background writer/checkpoint writes back up because Linux has been lazy about getting to them, that takes a while to clear no matter how good the underlying hardware. Write barriers were supposed to improve all this when added to ext3, but they just never seemed to work right for many people. After reading that lkml thread, among others, I know I was left not trusting anything beyond the simplest path through this area of the filesystem. Slow is better than corrupted. So the good news I was relaying is that it looks like this finally work on ext4, giving it the behavior you described and expected, but that's not actually been there until now. I was hoping someone with more free time than me might be interested to go investigate further if I pointed the advance out. I'm stuck with too many production systems to play with new kernels at the moment, but am quite curious. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance