Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On 24/07/10 01:28, Robert Haas wrote: Well, if we could change the backends so that they could fully reinitialize themselves (disconnect from a database to which they are bound, etc.), I don't see why we couldn't use the Apache approach. This would offer the bonus on the side that it'd be more practical to implement database changes for a connection, akin to MySQL's USE. Inefficient, sure, but possible. I don't care about that current limitation very much. I think anyone changing databases all the time probably has the wrong design and should be using schema. I'm sure there are times it'd be good to be able to switch databases on one connection, though. My question with all this remains: is it worth the effort when external poolers already solve the problem. Can PostgreSQL offer tools and interfaces to permit external poolers to solve the problems they have, rather than trying to bring them in-core? For example, with auth, can the Pg server offer features to help poolers implement passthrough authentication against the real Pg server? Perhaps Pg could expose auth features over SQL, permitting appropriately privileged users to verify credentials with SQL-level calls. Poolers could pass supplied user credentials through to the real Pg server for verification. For bonus points, an SQL interface could be provided that lets the super-priveleged auth managing connection be used to change the login role of another running backend/connection, so the pooler could hand out connections with different login user ids without having to maintain a pool per user id. ( That'd probably also permit implementation of a CHANGE USER command, where the client changed login roles on the fly by passing the credentials of the new role. That'd be *awesome* for application server connection pools. ) -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On 24/07/10 13:23, Greg Smith wrote: Joshua Tolley wrote: Relatively minor, but it would be convenient to avoid having to query $external_pooler to determine the client_addr of an incoming connection. You suggest this as a minor concern, but I consider it to be one of the most compelling arguments in favor of in-core pooling. A constant pain with external poolers is the need to then combine two sources of data in order to track connections fully, which is something that everyone runs into eventually and finds annoying. It's one of the few things that doesn't go away no matter how much fiddling you do with pgBouncer, it's always getting in the way a bit. And it seems to seriously bother systems administrators and developers, not just the DBAs. Putting a pooler in core won't inherently fix this, and won't remove the need to solve it for cases where the pooler can't be on the same machine. 9.0 has application_name to let apps identify themselves. Perhaps a pooled_client_ip, to be set by a pooler rather than the app, could be added to address this problem in a way that can be used by all poolers new and existing, not just any new in-core pooling system. If a privileged set of pooler functions is was considered, as per my other recent mail, the pooler could use a management connection to set the client ip before handing the connection to the client, so the client couldn't change pooled_client_ip its self by accident or through malice. But even without that, it'd be awfully handy. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Testing Sandforce SSD
Hello list, Probably like many other's I've wondered why no SSD manufacturer puts a small BBU on a SSD drive. Triggered by Greg Smith's mail http://archives.postgresql.org/pgsql-performance/2010-02/msg00291.php here, and also anandtech's review at http://www.anandtech.com/show/2899/1 (see page 6 for pictures of the capacitor) I ordered a SandForce drive and this week it finally arrived. And now I have to test it and was wondering about some things like * How to test for power failure? I thought by running on the same machine a parallel pgbench setup on two clusters where one runs with data and wal on a rotating disk, the other on the SSD, both without BBU controller. Then turn off power. Do that a few times. The problem in this scenario is that even when the SSD would show not data loss and the rotating disk would for a few times, a dozen tests without failure isn't actually proof that the drive can write it's complete buffer to disk after power failure. * How long should the power be turned off? A minute? 15 minutes? * What filesystem to use on the SSD? To minimize writes and maximize chance for seeing errors I'd choose ext2 here. For the sake of not comparing apples with pears I'd have to go with ext2 on the rotating data disk as well. Do you guys have any more ideas to properly 'feel this disk at its teeth' ? regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Do you guys have any more ideas to properly 'feel this disk at its teeth' ? While an 'end-to-end' test using PG is fine, I think it would be easier to determine if the drive is behaving correctly by using a simple test program that emulates the storage semantics the WAL expects. Have it write a constant stream of records, fsync'ing after each write. Record the highest record number flushed so far in some place that won't be lost with the drive under test (e.g. send it over the network to another machine). Kill the power, bring the system back up again and examine what's at the tail end of that file. I think this will give you the worst case test with the easiest result discrimination. If you want to you could add concurrent random writes to another file for extra realism. Someone here may already have a suitable test program. I know I've written several over the years in order to test I/O performance, prove the existence of kernel bugs, and so on. I doubt it matters much how long the power is turned of. A second should be plenty time to flush pending writes if the drive is going to do so. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Sat, 24 Jul 2010, David Boreham wrote: Do you guys have any more ideas to properly 'feel this disk at its teeth' ? While an 'end-to-end' test using PG is fine, I think it would be easier to determine if the drive is behaving correctly by using a simple test program that emulates the storage semantics the WAL expects. Have it write a constant stream of records, fsync'ing after each write. Record the highest record number flushed so far in some place that won't be lost with the drive under test (e.g. send it over the network to another machine). Kill the power, bring the system back up again and examine what's at the tail end of that file. I think this will give you the worst case test with the easiest result discrimination. If you want to you could add concurrent random writes to another file for extra realism. Someone here may already have a suitable test program. I know I've written several over the years in order to test I/O performance, prove the existence of kernel bugs, and so on. I doubt it matters much how long the power is turned of. A second should be plenty time to flush pending writes if the drive is going to do so. remember that SATA is designed to be hot-plugged, so you don't have to kill the entire system to kill power to the drive. this is a little more ubrupt than the system loosing power, but in terms of loosing data this is about the worst case (while at the same time, it eliminates the possibility that the OS continues to perform writes to the drive as power dies, which is a completely different class of problems, independant of the drive type) David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using more tha one index per table
Craig James schrieb: The problem is that Google ranks pages based on inbound links, so older versions of Postgres *always* come up before the latest version in page ranking. Since 2009 you can deal with this by defining the canonical-version. (http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html) This is a really cool feature, but it's not what we need. The canonical refers to the URL, not the web page. It's only supposed to be used if you have multiple URLs that are actually the *same* page; the canonical URL tells Google use only this URL for this page. But in our case, the Postgres manuals for each release have different URLs *and* different content, so the canonical URL isn't the right solution. This is true, but the content is allowed to change a little. Of course their is no percentage of allowed changes. But it can be quite much. I've used this feature for some clients, which push their content into very different websites and it does work. Most of the content of the documentation doesn't change much between the releases. In most cases the canonical will work the way i suggest. In case of big changes even the recommandation of using a current version won't work. Its true that Google ranks pages based on inbound links. But there are more than 200 other factores, which influence the rankings. Most people do not know, that changing most of a sites content makes the inbound links for a long time useless. After big changes in the documentation the current entry will be droped for some monthes and the old entries will appear. But note, that every single site of the documentation is ranked for itself. From my experience i would expect the canonical-version with better results, than the current-version. But the canonical is not the best solution in my opinion. I often edit the urls of some documentations, because i need it for a special postgresql version. The documentation clearly misses a version-switch. Combined with an big note, that the current displayed documentation is not the one of the current postgresql-version, this will be the best compromiss in my opinion. Greetings from Germany, Torsten -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using more tha one index per table
On 7/24/10 5:57 AM, Torsten Zühlsdorff wrote: Craig James schrieb: The problem is that Google ranks pages based on inbound links, so older versions of Postgres *always* come up before the latest version in page ranking. Since 2009 you can deal with this by defining the canonical-version. (http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html) This is a really cool feature, but it's not what we need. The canonical refers to the URL, not the web page. It's only supposed to be used if you have multiple URLs that are actually the *same* page; the canonical URL tells Google use only this URL for this page. But in our case, the Postgres manuals for each release have different URLs *and* different content, so the canonical URL isn't the right solution. This is true, but the content is allowed to change a little. Of course their is no percentage of allowed changes. But it can be quite much. I've used this feature for some clients, which push their content into very different websites and it does work. Most of the content of the documentation doesn't change much between the releases. In most cases the canonical will work the way i suggest. In case of big changes even the recommandation of using a current version won't work. Its true that Google ranks pages based on inbound links. But there are more than 200 other factores, which influence the rankings. Most people do not know, that changing most of a sites content makes the inbound links for a long time useless. After big changes in the documentation the current entry will be droped for some monthes and the old entries will appear. But note, that every single site of the documentation is ranked for itself. From my experience i would expect the canonical-version with better results, than the current-version. But the canonical is not the best solution in my opinion. I often edit the urls of some documentations, because i need it for a special postgresql version. The documentation clearly misses a version-switch. Combined with an big note, that the current displayed documentation is not the one of the current postgresql-version, this will be the best compromiss in my opinion. Here's an idea: Use a current URL, plus a JavaScript embedded in every page that compares its own URL to the current URL and, if it doesn't match, does a document.write() indicating how to find the most-current version. That would solve three problems: 1. There would be a current version that people could link to. 2. If someone found an old version, they would know it and could instantly be directed to the current version. 3. It wouldn't be any burden on the web site maintainers, because the JavaScript wouldn't have to be changed. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Jul 24, 2010, at 12:20 AM, Yeb Havinga wrote: The problem in this scenario is that even when the SSD would show not data loss and the rotating disk would for a few times, a dozen tests without failure isn't actually proof that the drive can write it's complete buffer to disk after power failure. Yes, this is always going to be the case with testing like this - you'll never be able to prove that it will always be safe. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Yeb Havinga wrote: Probably like many other's I've wondered why no SSD manufacturer puts a small BBU on a SSD drive. Triggered by Greg Smith's mail http://archives.postgresql.org/pgsql-performance/2010-02/msg00291.php here, and also anandtech's review at http://www.anandtech.com/show/2899/1 (see page 6 for pictures of the capacitor) I ordered a SandForce drive and this week it finally arrived. Note that not all of the Sandforce drives include a capacitor; I hope you got one that does! I wasn't aware any of the SF drives with a capacitor on them were even shipping yet, all of the ones I'd seen were the chipset that doesn't include one still. Haven't checked in a few weeks though. * How to test for power failure? I've had good results using one of the early programs used to investigate this class of problems: http://brad.livejournal.com/2116715.html?page=2 You really need a second witness server to do this sort of thing reliably, which that provides. * What filesystem to use on the SSD? To minimize writes and maximize chance for seeing errors I'd choose ext2 here. I don't consider there to be any reason to deploy any part of a PostgreSQL database on ext2. The potential for downtime if the fsck doesn't happen automatically far outweighs the minimal performance advantage you'll actually see in real applications. All of the benchmarks showing large gains for ext2 over ext3 I have seen been synthetic, not real database performance; the internal ones I've run using things like pgbench do not show a significant improvement. (Yes, I'm already working on finding time to publicly release those findings) Put it on ext3, toggle on noatime, and move on to testing. The overhead of the metadata writes is the least of the problems when doing write-heavy stuff on Linux. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Sat, Jul 24, 2010 at 3:20 AM, Yeb Havinga yebhavi...@gmail.com wrote: Hello list, Probably like many other's I've wondered why no SSD manufacturer puts a small BBU on a SSD drive. Triggered by Greg Smith's mail http://archives.postgresql.org/pgsql-performance/2010-02/msg00291.php here, and also anandtech's review at http://www.anandtech.com/show/2899/1 (see page 6 for pictures of the capacitor) I ordered a SandForce drive and this week it finally arrived. And now I have to test it and was wondering about some things like * How to test for power failure? I test like this: write a small program that sends a endless series of inserts like this: *) on the server: create table foo (id serial); *) from the client: insert into foo default values returning id; on the client side print the inserted value to the terminal after the query is reported as complete to the client. Run the program, wait a bit, then pull the plug on the server. The database should recover clean and the last reported insert on the client should be there when it restarts. Try restarting immediately a few times then if that works try it and let it simmer overnight. If it makes it at least 24-48 hours that's a very promising sign. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Greg Smith wrote: Note that not all of the Sandforce drives include a capacitor; I hope you got one that does! I wasn't aware any of the SF drives with a capacitor on them were even shipping yet, all of the ones I'd seen were the chipset that doesn't include one still. Haven't checked in a few weeks though. I think I did, it was expensive enough, though while ordering its very easy to order the wrong one, all names on the product category page look the same. (OCZ Vertex 2 Pro) * How to test for power failure? I've had good results using one of the early programs used to investigate this class of problems: http://brad.livejournal.com/2116715.html?page=2 A great tool, thanks for the link! diskchecker: running 34 sec, 4.10% coverage of 500 MB (1342 writes; 39/s) diskchecker: running 35 sec, 4.24% coverage of 500 MB (1390 writes; 39/s) diskchecker: running 36 sec, 4.35% coverage of 500 MB (1427 writes; 39/s) diskchecker: running 37 sec, 4.47% coverage of 500 MB (1468 writes; 39/s) didn't get 'ok' from server (11387 316950), msg=[] = Connection reset by peer at ./diskchecker.pl line 132. here's where I removed the power and left it off for about a minute. Then started again then did the verify y...@a:~$ ./diskchecker.pl -s client45.eemnes verify test_file verifying: 0.00% Total errors: 0 :-) this was on ext2 * What filesystem to use on the SSD? To minimize writes and maximize chance for seeing errors I'd choose ext2 here. I don't consider there to be any reason to deploy any part of a PostgreSQL database on ext2. The potential for downtime if the fsck doesn't happen automatically far outweighs the minimal performance advantage you'll actually see in real applications. Hmm.. wouldn't that apply for other filesystems as well? I know that JFS also won't mount if booted unclean, it somehow needs a marker from the fsck. Don't know for ext3, xfs etc. All of the benchmarks showing large gains for ext2 over ext3 I have seen been synthetic, not real database performance; the internal ones I've run using things like pgbench do not show a significant improvement. (Yes, I'm already working on finding time to publicly release those findings) The reason I'd choose ext2 on the SSD was mainly to decrease the number of writes, not for performance. Maybe I should ultimately do tests for both journalled and ext2 filesystems and compare the amount of data per x pgbench transactions. Put it on ext3, toggle on noatime, and move on to testing. The overhead of the metadata writes is the least of the problems when doing write-heavy stuff on Linux. Will surely do and post the results. thanks, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Yeb Havinga wrote: diskchecker: running 37 sec, 4.47% coverage of 500 MB (1468 writes; 39/s) Total errors: 0 :-) OTOH, I now notice the 39 write /s .. If that means ~ 39 tps... bummer. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Greg Smith wrote: Note that not all of the Sandforce drives include a capacitor; I hope you got one that does! I wasn't aware any of the SF drives with a capacitor on them were even shipping yet, all of the ones I'd seen were the chipset that doesn't include one still. Haven't checked in a few weeks though. Answer my own question here: the drive Yeb got was the brand spanking new OCZ Vertex 2 Pro, selling for $649 at Newegg for example: http://www.newegg.com/Product/Product.aspx?Item=N82E16820227535 and with the supercacitor listed right in the main production specifications there. This is officially the first inexpensive (relatively) SSD with a battery-backed write cache built into it. If Yeb's test results prove it works as it's supposed to under PostgreSQL, I'll be happy to finally have a moderately priced SSD I can recommend to people for database use. And I fear I'll be out of excuses to avoid buying one as a toy for my home system. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Sat, 2010-07-24 at 16:21 -0400, Greg Smith wrote: Greg Smith wrote: Note that not all of the Sandforce drives include a capacitor; I hope you got one that does! I wasn't aware any of the SF drives with a capacitor on them were even shipping yet, all of the ones I'd seen were the chipset that doesn't include one still. Haven't checked in a few weeks though. Answer my own question here: the drive Yeb got was the brand spanking new OCZ Vertex 2 Pro, selling for $649 at Newegg for example: http://www.newegg.com/Product/Product.aspx?Item=N82E16820227535 and with the supercacitor listed right in the main production specifications there. This is officially the first inexpensive (relatively) SSD with a battery-backed write cache built into it. If Yeb's test results prove it works as it's supposed to under PostgreSQL, I'll be happy to finally have a moderately priced SSD I can recommend to people for database use. And I fear I'll be out of excuses to avoid buying one as a toy for my home system. That is quite the toy. I can get 4 SATA-II with RAID Controller, with battery backed cache, for the same price or less :P Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Joshua D. Drake wrote: That is quite the toy. I can get 4 SATA-II with RAID Controller, with battery backed cache, for the same price or less :P True, but if you look at tests like http://www.anandtech.com/show/2899/12 it suggests there's probably at least a 6:1 performance speedup for workloads with a lot of random I/O to them. And I'm really getting sick of the power/noise/heat that the 6 drives in my home server produces. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Yeb Havinga wrote: Yeb Havinga wrote: diskchecker: running 37 sec, 4.47% coverage of 500 MB (1468 writes; 39/s) Total errors: 0 :-) OTOH, I now notice the 39 write /s .. If that means ~ 39 tps... bummer. When playing with it a bit more, I couldn't get the test_file to be created in the right place on the test system. It turns out I had the diskchecker config switched and 39 write/s was the speed of the not-rebooted system, sorry. I did several diskchecker.pl tests this time with the testfile on the SSD, none of the tests have returned an error :-) Writes/s start low but quickly converge to a number in the range of 1200 to 1800. The writes diskchecker does are 16kB writes. Making this 4kB writes does not increase writes/s. 32kB seems a little less, 64kB is about two third of initial writes/s and 128kB is half. So no BBU speeds here for writes, but still ~ factor 10 improvement of iops for a rotating SATA disk. regards, Yeb Havinga PS: hdparm showed write cache was on. I did tests with both ext2 and xfs, where xfs tests I did with both barrier and nobarrier. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Yeb Havinga wrote: Writes/s start low but quickly converge to a number in the range of 1200 to 1800. The writes diskchecker does are 16kB writes. Making this 4kB writes does not increase writes/s. 32kB seems a little less, 64kB is about two third of initial writes/s and 128kB is half. Let's turn that into MB/s numbers: 4k * 1200 = 4.7 MB/s 8k * 1200 = 9.4 MB/s 16k * 1200 = 18.75 MB/s 64kb * 1200 * 2/3 [800] = 37.5 MB/s 128kb * 1200 / 2 [600] = 75 MB/s For comparison sake, a 7200 RPM drive running PostgreSQL will do 120 commits/second without a BBWC, so at an 8K block size that's 1 MB/s. If you put a cache in the middle, I'm used to seeing about 5000 8K commits/second, which is around 40 MB/s. So this is sitting right in the middle of those two. Sequential writes with a commit after each one like this are basically the worst case for the SSD, so if it can provide reasonable performance on that I'd be happy. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance