Re: [PERFORM] further testing on IDE drives
Bruce Momjian wrote: Yes. If you were doing multiple WAL writes before transaction fsync, you would be fsyncing every write, rather than doing two writes and fsync'ing them both. I wonder if larger transactions would find open_sync slower? No hard numbers, but I remember testing fsync vs open_sync something ago on 7.3.x. open_sync was blazingly fast for pgbench, but for when we switched our development database over to open_sync, things slowed to a crawl. This was some months ago, and I might be wrong, so take it with a grain of salt. It was on Red Hat 8's Linux kernel 2.4.18, I think. YMMV. Will be testing it real soon tonight, if possible. pgp0.pgp Description: PGP signature
Re: [PERFORM] further testing on IDE drives
On Tue, 14 Oct 2003, Tom Lane wrote: scott.marlowe [EMAIL PROTECTED] writes: open_sync was WAY faster at this than the other two methods. Do you not have open_datasync? That's the preferred method if available. Nope, when I try to start postgresql with it set to that, I get this error message: FATAL: invalid value for wal_sync_method: open_datasync This is on RedHat 9, but I have the same problem on a RH 7.2 box as well. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] further testing on IDE drives
BM == Bruce Momjian [EMAIL PROTECTED] writes: BM COPY only does fsync on COPY completion, so I am not sure there are BM enough fsync's there to make a difference. Perhaps then it is part of the indexing that takes so much time with the WAL. When I applied Marc's WAL disabling patch, it shaved nearly 50 minutes off of a 4-hour restore. I sent to Tom the logs from the restores since he was interested in figuring out where the time was saved. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] further testing on IDE drives
On Thu, 9 Oct 2003, Bruce Momjian wrote: scott.marlowe wrote: I was testing to get some idea of how to speed up the speed of pgbench with IDE drives and the write caching turned off in Linux (i.e. hdparm -W0 /dev/hdx). The only parameter that seems to make a noticeable difference was setting wal_sync_method = open_sync. With it set to either fsync, or fdatasync, the speed with pgbench -c 5 -t 1000 ran from 11 to 17 tps. With open_sync it jumped to the range of 45 to 52 tps. with write cache on I was getting 280 to 320 tps. so, not instead of being 20 to 30 times slower, I'm only about 5 times slower, much better. Now I'm off to start a pgbench -c 10 -t 1 and pull the power cord and see if the data gets corrupted with write caching turned on, i.e. do my hard drives have the ability to write at least some of their cache during spin down. Is this a reason we should switch to open_sync as a default, if it is availble, rather than fsync? I think we are doing a single write before fsync a lot more often than we are doing multiple writes before fsync. Sounds reasonable to me. Are there many / any scenarios where a plain fsync would be faster than open_sync? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] further testing on IDE drives
scott.marlowe wrote: On Thu, 9 Oct 2003, Bruce Momjian wrote: scott.marlowe wrote: I was testing to get some idea of how to speed up the speed of pgbench with IDE drives and the write caching turned off in Linux (i.e. hdparm -W0 /dev/hdx). The only parameter that seems to make a noticeable difference was setting wal_sync_method = open_sync. With it set to either fsync, or fdatasync, the speed with pgbench -c 5 -t 1000 ran from 11 to 17 tps. With open_sync it jumped to the range of 45 to 52 tps. with write cache on I was getting 280 to 320 tps. so, not instead of being 20 to 30 times slower, I'm only about 5 times slower, much better. Now I'm off to start a pgbench -c 10 -t 1 and pull the power cord and see if the data gets corrupted with write caching turned on, i.e. do my hard drives have the ability to write at least some of their cache during spin down. Is this a reason we should switch to open_sync as a default, if it is availble, rather than fsync? I think we are doing a single write before fsync a lot more often than we are doing multiple writes before fsync. Sounds reasonable to me. Are there many / any scenarios where a plain fsync would be faster than open_sync? Yes. If you were doing multiple WAL writes before transaction fsync, you would be fsyncing every write, rather than doing two writes and fsync'ing them both. I wonder if larger transactions would find open_sync slower? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] further testing on IDE drives
On Fri, 10 Oct 2003, Josh Berkus wrote: Bruce, Yes. If you were doing multiple WAL writes before transaction fsync, you would be fsyncing every write, rather than doing two writes and fsync'ing them both. I wonder if larger transactions would find open_sync slower? Want me to test? I've got an ide-based test machine here, and the TPCC databases. Just make sure the drive's write cache is disabled. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] further testing on IDE drives
Josh Berkus wrote: Bruce, Yes. If you were doing multiple WAL writes before transaction fsync, you would be fsyncing every write, rather than doing two writes and fsync'ing them both. I wonder if larger transactions would find open_sync slower? Want me to test? I've got an ide-based test machine here, and the TPCC databases. I would be interested to see if wal_sync_method = fsync is slower than wal_sync_method = open_sync. How often are we doing more then one write before a fsync anyway? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] further testing on IDE drives
Bruce, I would be interested to see if wal_sync_method = fsync is slower than wal_sync_method = open_sync. How often are we doing more then one write before a fsync anyway? OK. I'll see if I can get to it around my other stuff I have to do this weekend. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] further testing on IDE drives
BM == Bruce Momjian [EMAIL PROTECTED] writes: Sounds reasonable to me. Are there many / any scenarios where a plain fsync would be faster than open_sync? BM Yes. If you were doing multiple WAL writes before transaction fsync, BM you would be fsyncing every write, rather than doing two writes and BM fsync'ing them both. I wonder if larger transactions would find BM open_sync slower? consider loading a large database from a backup dump. one big transaction during the COPY. I don't know the implications it has on this scenario, though. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] further testing on IDE drives
Vivek Khera wrote: BM == Bruce Momjian [EMAIL PROTECTED] writes: Sounds reasonable to me. Are there many / any scenarios where a plain fsync would be faster than open_sync? BM Yes. If you were doing multiple WAL writes before transaction fsync, BM you would be fsyncing every write, rather than doing two writes and BM fsync'ing them both. I wonder if larger transactions would find BM open_sync slower? consider loading a large database from a backup dump. one big transaction during the COPY. I don't know the implications it has on this scenario, though. COPY only does fsync on COPY completion, so I am not sure there are enough fsync's there to make a difference. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] further testing on IDE drives
On Fri, 10 Oct 2003, Josh Berkus wrote: Bruce, Yes. If you were doing multiple WAL writes before transaction fsync, you would be fsyncing every write, rather than doing two writes and fsync'ing them both. I wonder if larger transactions would find open_sync slower? Want me to test? I've got an ide-based test machine here, and the TPCC databases. OK, I decided to do a quick dirty test of things that are big transactions in each mode my kernel supports. I did this: createdb dbname time pg_dump -O -h otherserver dbname|psql dbname then I would drop the db, edit postgresql.conf, and restart the server. open_sync was WAY faster at this than the other two methods. open_sync: 1st run: real11m27.107s user0m26.570s sys 0m1.150s 2nd run: real6m5.712s user0m26.700s sys 0m1.700s fsync: 1st run: real15m8.127s user0m26.710s sys 0m0.990s 2nd run: real15m8.396s user0m26.990s sys 0m1.870s fdatasync: 1st run: real15m47.878s user0m26.570s sys 0m1.480s 2nd run: real15m9.402s user0m27.000s sys 0m1.660s I did the first runs in order, then started over, i.e. opensync run1, fsync run1, fdatasync run1, opensync run2, etc... The machine I was restoring to was under no other load. The machine I was reading from had little or no load, but is a production server, so it's possible the load there could have had a small effect, but probably not this big of a one. The machine this is one is setup so that the data partition is on a drive with write cache enabled, but the pg_xlog and pg_clog directories are on a drive with write cache disabled. Same drive models as listed before in my previous test, Seagate generic 80gig IDE drives, model ST380023A. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] further testing on IDE drives
How did this drive come by default? Write-cache disabled? --- scott.marlowe wrote: On Thu, 2 Oct 2003, scott.marlowe wrote: I was testing to get some idea of how to speed up the speed of pgbench with IDE drives and the write caching turned off in Linux (i.e. hdparm -W0 /dev/hdx). The only parameter that seems to make a noticeable difference was setting wal_sync_method = open_sync. With it set to either fsync, or fdatasync, the speed with pgbench -c 5 -t 1000 ran from 11 to 17 tps. With open_sync it jumped to the range of 45 to 52 tps. with write cache on I was getting 280 to 320 tps. so, not instead of being 20 to 30 times slower, I'm only about 5 times slower, much better. Now I'm off to start a pgbench -c 10 -t 1 and pull the power cord and see if the data gets corrupted with write caching turned on, i.e. do my hard drives have the ability to write at least some of their cache during spin down. OK, back from testing. Information: Dual PIV system with a pair of 80 gig IDE drives, model number: ST380023A (seagate). File system is ext3 and is on a seperate drive from the OS. These drives DO NOT write cache when they lose power. Testing was done by issuing a 'hdparm -W0/1 /dev/hdx' command where x is the real drive letter, and 0 or 1 was chosen in place of 0/1. Then I'd issue a 'pgbench -c 50 -t 1' command, wait for a few minutes, then pull the power cord. I'm running RH linux 9.0 stock install, kernel: 2.4.20-8smp. Three times pulling the plug with 'hdparm -W0 /dev/hdx' resulted in a machine that would boot up, recover with journal, and a database that came up within about 30 seconds, with all the accounts still intact. Switching the caching back on with 'hdparm -W1 /dev/hdx' and doing the same 'pgbench -c 50 -t 1' resulted in a corrupted database each time. Also, I tried each of the following fsync methods: fsync, fdatasync, and open_sync with write caching turned off. Each survived a power off test with no corruption of the database. fsync and fdatasync result in 11 to 17 tps with 'pgbench -c 5 -t 500' while open_sync resulted in 45 to 55 tps, as mentioned in the previous post. I'd be interested in hearing from other folks which sync method works for them and whether or not there are any IDE drives out there that can write their cache to the platters on power off when caching is enabled. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html