Re: [PERFORM] x206-x225
Jim C. Nasby wrote: I think you mean this... http://www.postgresql.org/docs/8.1/static/runtime-config-wal.html commit_delay (integer) No, that's not what I mean at all. On a system doing a large number of WAL-generating transactions per second, it's certainly possible for multiple transactions to commit in the period of time it takes for the platter to rotate back into position to allow for writing of the WAL data. What I don't know is if those multiple transactions would actually make it to the platter on that rotation, or if they'd serialize, resulting in one commit per revolution. I do know that there's no theoretical reason that they couldn't, it's just a matter of putting enough intelligence in the drive. Perhaps this is something that SCSI supports and (S)ATA doesn't, since SCSI allows multiple transactions to be 'in flight' on the bus at once. SCSI Command queueing: http://www.storagereview.com/guide2000/ref/hdd/if/scsi/protCQR.html SATA native command queuing: http://www.tomshardware.com/2004/11/16/can_command_queuing_turbo_charge_sata/ But since you mention commit_delay, this does lead to an interesting possible use: set it equal to the effective rotational period of the drive. If you know your transaction load well enough, you could possibly gain some benefit here. But of course a RAID controller with a BBU would be a better bet... I suppose as long as you always have several transactions trying to commit, have a separate spindle(s) for the WAL then you could improve throughput at the cost of the shortest transaction times. Of course, it might be that the increase in lock duration etc. might outweigh any benefits. I'd suspect the cost/gain would be highly variable with changes in workload, and as you say write-cache+BBU seems more sensible. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] x206-x225
On Fri, Mar 10, 2006 at 11:57:16PM -0800, David Lang wrote: On Sat, 11 Mar 2006, Joost Kraaijeveld wrote: On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote: Your ATA disk is lying about disk caching being turned off. Assuming each insert is in a separate transaction, then it's not going to do 10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational speed. Could you explain the calculation? Why should the number of transactions be related to the rotational speed of the disk, without saying anything about the number of bytes per rotation? each transaction requires a sync to the disk, a sync requires a real write (which you then wait for), so you can only do one transaction per rotation. But shouldn't it be possible to batch up WAL writes and syncs? In other words, if you have 5 transactions that all COMMIT at exactly the same time, it should be possible to get all 5 WAL pages (I'll assume each one generated a small enough change so as not to require multiple WAL pages) to the drive before the platter comes around to the right position. The drive should then be able to write all 5 at once. At least, theoretically... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] x206-x225
Jim C. Nasby wrote: On Fri, Mar 10, 2006 at 11:57:16PM -0800, David Lang wrote: On Sat, 11 Mar 2006, Joost Kraaijeveld wrote: On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote: Your ATA disk is lying about disk caching being turned off. Assuming each insert is in a separate transaction, then it's not going to do 10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational speed. Could you explain the calculation? Why should the number of transactions be related to the rotational speed of the disk, without saying anything about the number of bytes per rotation? each transaction requires a sync to the disk, a sync requires a real write (which you then wait for), so you can only do one transaction per rotation. But shouldn't it be possible to batch up WAL writes and syncs? In other words, if you have 5 transactions that all COMMIT at exactly the same time, it should be possible to get all 5 WAL pages (I'll assume each one generated a small enough change so as not to require multiple WAL pages) to the drive before the platter comes around to the right position. The drive should then be able to write all 5 at once. At least, theoretically... I think you mean this... http://www.postgresql.org/docs/8.1/static/runtime-config-wal.html commit_delay (integer) Time delay between writing a commit record to the WAL buffer and flushing the buffer out to disk, in microseconds. A nonzero delay can allow multiple transactions to be committed with only one fsync() system call, if system load is high enough that additional transactions become ready to commit within the given interval. But the delay is just wasted if no other transactions become ready to commit. Therefore, the delay is only performed if at least commit_siblings other transactions are active at the instant that a server process has written its commit record. The default is zero (no delay). commit_siblings (integer) Minimum number of concurrent open transactions to require before performing the commit_delay delay. A larger value makes it more probable that at least one other transaction will become ready to commit during the delay interval. The default is five. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] x206-x225
On Fri, 2006-03-10 at 23:57 -0800, David Lang wrote: On Sat, 11 Mar 2006, Joost Kraaijeveld wrote: On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote: Your ATA disk is lying about disk caching being turned off. Assuming each insert is in a separate transaction, then it's not going to do 10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational speed. Could you explain the calculation? Why should the number of transactions be related to the rotational speed of the disk, without saying anything about the number of bytes per rotation? each transaction requires a sync to the disk, a sync requires a real write (which you then wait for), so you can only do one transaction per rotation. Not according to a conversation I had with Western Digital about the write performance of my own SATA disks. What I understand from their explanation their disk are limited by the MB/sec and not by the number of writes/second, e.g. I could write 50 MB/sec *in 1 bit/write* on my disk. This would suggest that the maximum transactions of my disk (overhead of OS and PostgreSQL ignored) would be 50MB / (transaction size in MB) per second. Or am I missing something (what would not surprise me, as I do not understand the perforance of my system at all ;-))? -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] x206-x225
each transaction requires a sync to the disk, a sync requires a real write (which you then wait for), so you can only do one transaction per rotation. Not according to a conversation I had with Western Digital about the It depends if you consider that written to the disk means data is somewhere between the OS cache and the platter or data is writter on the platter and will survive a power loss. Postgres wants the second option, of course. For that, the data has to be on the disk. Thus, the disk has to seek, wait till the desired sector arrives in front of the head, write, and tell the OS it's done. Your disk just stores data in its embedded RAM buffer and tells the OS it's written, but if you lose power, you lose anything that's in the disk embedded RAM cache... Advanced RAID cards have battery backed up RAM cache precisely for that purpose. Your harddisk doesn't. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] x206-x225
On Sat, 2006-03-11 at 12:33 +0100, PFC wrote: each transaction requires a sync to the disk, a sync requires a real write (which you then wait for), so you can only do one transaction per rotation. Not according to a conversation I had with Western Digital about the It depends if you consider that written to the disk means data is somewhere between the OS cache and the platter or data is writter on the platter and will survive a power loss. Postgres wants the second option, of course. I assume that for PostgreSQL written to disk is after fsync returned successfully. In practice that could very well mean that the data is still in a cache somewhere (controller or harddisk, not in the OS anymore, see also man page of fsync) -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] x206-x225
Joost Kraaijeveld [EMAIL PROTECTED] writes: I assume that for PostgreSQL written to disk is after fsync returned successfully. In practice that could very well mean that the data is still in a cache somewhere (controller or harddisk, not in the OS anymore, see also man page of fsync) What it had better mean, if you want your database to be reliable, is that the data is stored someplace that will survive a system crash (power outage, kernel panic, etc). A battery-backed RAM cache is OK, assuming that total failure of the RAID controller is not one of the events you consider likely enough to need protection against. The description of your SATA drive makes it sound like the drive does not put data on the platter before reporting write complete, but only stores it in on-board RAM cache. It is highly unlikely that there is any battery backing for that cache, and therefore that drive is not to be trusted. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] x206-x225
On Sat, 11 Mar 2006, Joost Kraaijeveld wrote: Date: Sat, 11 Mar 2006 09:17:09 +0100 From: Joost Kraaijeveld [EMAIL PROTECTED] To: David Lang [EMAIL PROTECTED] Cc: Richard Huxton dev@archonet.com, pgsql-performance@postgresql.org Subject: Re: [PERFORM] x206-x225 On Fri, 2006-03-10 at 23:57 -0800, David Lang wrote: On Sat, 11 Mar 2006, Joost Kraaijeveld wrote: On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote: Your ATA disk is lying about disk caching being turned off. Assuming each insert is in a separate transaction, then it's not going to do 10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational speed. Could you explain the calculation? Why should the number of transactions be related to the rotational speed of the disk, without saying anything about the number of bytes per rotation? each transaction requires a sync to the disk, a sync requires a real write (which you then wait for), so you can only do one transaction per rotation. Not according to a conversation I had with Western Digital about the write performance of my own SATA disks. What I understand from their explanation their disk are limited by the MB/sec and not by the number of writes/second, e.g. I could write 50 MB/sec *in 1 bit/write* on my disk. This would suggest that the maximum transactions of my disk (overhead of OS and PostgreSQL ignored) would be 50MB / (transaction size in MB) per second. Or am I missing something (what would not surprise me, as I do not understand the perforance of my system at all ;-))? but if you do a 1 bit write, and wait for it to complete, and then do another 1 bit write that belongs on disk immediatly after the first one (and wait for it to complete) you have to wait until the disk rotates to the point that it can make the write before it's really safe on disk. so you can do one transaction in less then one rotation, but if you do 50 transactions you must wait at least 49 (and a fraction) roatations. if the disk cache is turned on then you don't have to wait for this, but you also will loose the data if you loose power so it's really not safe. David Lang ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] x206-x225
H.J. Sanders wrote: X206 IBM X226 ----- processorPentium 4 3.2 GhzXeon 3.0 Ghz main memory1.25 GB4 GB discs 2 x SCSI RAID1 1RPM 1 x ATA 7200 RPM Noting that the SCSI discs are on the *slower* machine. Time at X206Time at X226 -- insert record (1 to 1) 6 sec.41 sec. select record (1 to 1) 4 4 delete record (1 to 1) 6 41 This is ofcourse a totally unexpected results (you should think off the opposite). Your ATA disk is lying about disk caching being turned off. Assuming each insert is in a separate transaction, then it's not going to do 10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational speed. Funny is that the select time is the same for both machines. Because you're limited by the speed to read from RAM. By the way - these sort of tests are pretty much meaningless in any practical terms. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] x206-x225
The primary slow down is probably between your system bus from main memory to your disk storage. If you notice from your statistics that the select statements are very close. This is because all the data you need is already in system memory. The primary bottle neck is probably disk I/O. Scsi will always be faster than ATA. Scsi devices have dedicated hardware for getting data to and from the disc to the main system bus without requiring a trip through the CPU. You may be able to speed up the ata disc by enabling DMA by using hdparm. hdparm -d1 /dev/hda (or whatever your device is) -Daniel -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) iD8DBQBEEYzX9SJ2nhowvKERAoiFAKCLR+7a7ReZ2mjjPjpONHLGIQD1SgCeNNON V1kbyATIFVPWuf1W6Ji0IFg= =5Msr -END PGP SIGNATURE- On 3/10/06, Richard Huxton dev@archonet.com wrote: H.J. Sanders wrote: X206 IBM X226 - processorPentium 4 3.2 GhzXeon 3.0 Ghz main memory1.25 GB4 GB discs2 x SCSI RAID11RPM 1 x ATA 7200 RPMNoting that the SCSI discs are on the *slower* machine. Time at X206Time at X226 -- insert record (1 to 1)6 sec.41 sec. select record (1 to 1)44 delete record (1 to 1)6 41 This is ofcourse a totally unexpected results (you should think off the opposite).Your ATA disk is lying about disk caching being turned off. Assumingeach insert is in a separate transaction, then it's not going to do 10,000 / 6 = 1667 transactions/sec - that's faster than it's rotationalspeed. Funny is that the select time is the same for both machines.Because you're limited by the speed to read from RAM. By the way - these sort of tests are pretty much meaningless in anypractical terms.-- Richard Huxton Archonet Ltd---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] x206-x225
On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote: Your ATA disk is lying about disk caching being turned off. Assuming each insert is in a separate transaction, then it's not going to do 10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational speed. Could you explain the calculation? Why should the number of transactions be related to the rotational speed of the disk, without saying anything about the number of bytes per rotation? -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 e-mail: [EMAIL PROTECTED] web: www.askesis.nl ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] x206-x225
On Sat, 11 Mar 2006, Joost Kraaijeveld wrote: On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote: Your ATA disk is lying about disk caching being turned off. Assuming each insert is in a separate transaction, then it's not going to do 10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational speed. Could you explain the calculation? Why should the number of transactions be related to the rotational speed of the disk, without saying anything about the number of bytes per rotation? each transaction requires a sync to the disk, a sync requires a real write (which you then wait for), so you can only do one transaction per rotation. David Lang ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq