Re: [PERFORM] x206-x225

2006-03-15 Thread Richard Huxton

Jim C. Nasby wrote:

I think you mean this...

http://www.postgresql.org/docs/8.1/static/runtime-config-wal.html

commit_delay (integer)


No, that's not what I mean at all. On a system doing a large number of
WAL-generating transactions per second, it's certainly possible for
multiple transactions to commit in the period of time it takes for the
platter to rotate back into position to allow for writing of the WAL
data. What I don't know is if those multiple transactions would actually
make it to the platter on that rotation, or if they'd serialize,
resulting in one commit per revolution. I do know that there's no
theoretical reason that they couldn't, it's just a matter of putting
enough intelligence in the drive.

Perhaps this is something that SCSI supports and (S)ATA doesn't, since
SCSI allows multiple transactions to be 'in flight' on the bus at once.


SCSI Command queueing:
http://www.storagereview.com/guide2000/ref/hdd/if/scsi/protCQR.html

SATA native command queuing:
http://www.tomshardware.com/2004/11/16/can_command_queuing_turbo_charge_sata/


But since you mention commit_delay, this does lead to an interesting
possible use: set it equal to the effective rotational period of the
drive. If you know your transaction load well enough, you could possibly
gain some benefit here. But of course a RAID controller with a BBU would
be a better bet...


I suppose as long as you always have several transactions trying to 
commit, have a separate spindle(s) for the WAL then you could improve 
throughput at the cost of the shortest transaction times. Of course, it 
might be that the increase in lock duration etc. might outweigh any 
benefits. I'd suspect the cost/gain would be highly variable with 
changes in workload, and as you say write-cache+BBU seems more sensible.



--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] x206-x225

2006-03-14 Thread Jim C. Nasby
On Fri, Mar 10, 2006 at 11:57:16PM -0800, David Lang wrote:
 On Sat, 11 Mar 2006, Joost Kraaijeveld wrote:
 
 On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote:
 Your ATA disk is lying about disk caching being turned off. Assuming
 each insert is in a separate transaction, then it's not going to do
 10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational
 speed.
 Could you explain the calculation? Why should the number of transactions
 be related to the rotational speed of the disk, without saying anything
 about the number of bytes per rotation?
 
 each transaction requires a sync to the disk, a sync requires a real 
 write (which you then wait for), so you can only do one transaction per 
 rotation.

But shouldn't it be possible to batch up WAL writes and syncs? In other
words, if you have 5 transactions that all COMMIT at exactly the same
time, it should be possible to get all 5 WAL pages (I'll assume each
one generated a small enough change so as not to require multiple WAL
pages) to the drive before the platter comes around to the right
position. The drive should then be able to write all 5 at once. At
least, theoretically...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] x206-x225

2006-03-14 Thread Richard Huxton

Jim C. Nasby wrote:

On Fri, Mar 10, 2006 at 11:57:16PM -0800, David Lang wrote:

On Sat, 11 Mar 2006, Joost Kraaijeveld wrote:


On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote:

Your ATA disk is lying about disk caching being turned off. Assuming
each insert is in a separate transaction, then it's not going to do
10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational
speed.

Could you explain the calculation? Why should the number of transactions
be related to the rotational speed of the disk, without saying anything
about the number of bytes per rotation?
each transaction requires a sync to the disk, a sync requires a real 
write (which you then wait for), so you can only do one transaction per 
rotation.


But shouldn't it be possible to batch up WAL writes and syncs? In other
words, if you have 5 transactions that all COMMIT at exactly the same
time, it should be possible to get all 5 WAL pages (I'll assume each
one generated a small enough change so as not to require multiple WAL
pages) to the drive before the platter comes around to the right
position. The drive should then be able to write all 5 at once. At
least, theoretically...


I think you mean this...

http://www.postgresql.org/docs/8.1/static/runtime-config-wal.html

commit_delay (integer)

Time delay between writing a commit record to the WAL buffer and 
flushing the buffer out to disk, in microseconds. A nonzero delay can 
allow multiple transactions to be committed with only one fsync() system 
call, if system load is high enough that additional transactions become 
ready to commit within the given interval. But the delay is just wasted 
if no other transactions become ready to commit. Therefore, the delay is 
only performed if at least commit_siblings other transactions are active 
at the instant that a server process has written its commit record. The 
default is zero (no delay).


commit_siblings (integer)

Minimum number of concurrent open transactions to require before 
performing the commit_delay delay. A larger value makes it more probable 
that at least one other transaction will become ready to commit during 
the delay interval. The default is five.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] x206-x225

2006-03-11 Thread Joost Kraaijeveld
On Fri, 2006-03-10 at 23:57 -0800, David Lang wrote:
 On Sat, 11 Mar 2006, Joost Kraaijeveld wrote:
 
  On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote:
  Your ATA disk is lying about disk caching being turned off. Assuming
  each insert is in a separate transaction, then it's not going to do
  10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational
  speed.
  Could you explain the calculation? Why should the number of transactions
  be related to the rotational speed of the disk, without saying anything
  about the number of bytes per rotation?
 
 each transaction requires a sync to the disk, a sync requires a real 
 write (which you then wait for), so you can only do one transaction per 
 rotation.
Not according to a conversation I had with Western Digital about the
write performance of my own SATA disks. What I understand from their
explanation their disk are limited by the MB/sec and not by the number
of writes/second, e.g. I could write 50 MB/sec *in 1 bit/write* on my
disk. This would suggest that the maximum transactions of my disk
(overhead of OS and PostgreSQL ignored) would be 50MB / (transaction
size in MB) per second. Or am I missing something (what would not
surprise me, as I do not understand the perforance of my system at
all ;-))?

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] x206-x225

2006-03-11 Thread PFC



each transaction requires a sync to the disk, a sync requires a real
write (which you then wait for), so you can only do one transaction per
rotation.

Not according to a conversation I had with Western Digital about the



It depends if you consider that written to the disk means data is  
somewhere between the OS cache and the platter or data is writter on the  
platter and will survive a power loss.


Postgres wants the second option, of course.

For that, the data has to be on the disk. Thus, the disk has to seek, wait  
till the desired sector arrives in front of the head, write, and tell the  
OS it's done. Your disk just stores data in its embedded RAM buffer and  
tells the OS it's written, but if you lose power, you lose anything that's  
in the disk embedded RAM cache...


Advanced RAID cards have battery backed up RAM cache precisely for that  
purpose. Your harddisk doesn't.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] x206-x225

2006-03-11 Thread Joost Kraaijeveld
On Sat, 2006-03-11 at 12:33 +0100, PFC wrote:
  each transaction requires a sync to the disk, a sync requires a real
  write (which you then wait for), so you can only do one transaction per
  rotation.
  Not according to a conversation I had with Western Digital about the
 
 
 It depends if you consider that written to the disk means data is  
 somewhere between the OS cache and the platter or data is writter on the  
 platter and will survive a power loss.
 
 Postgres wants the second option, of course.

I assume that for PostgreSQL written to disk is after fsync returned
successfully. In practice that could very well mean that the data is
still in a cache somewhere (controller or harddisk, not in the OS
anymore, see also man page of fsync)


-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] x206-x225

2006-03-11 Thread Tom Lane
Joost Kraaijeveld [EMAIL PROTECTED] writes:
 I assume that for PostgreSQL written to disk is after fsync returned
 successfully. In practice that could very well mean that the data is
 still in a cache somewhere (controller or harddisk, not in the OS
 anymore, see also man page of fsync)

What it had better mean, if you want your database to be reliable,
is that the data is stored someplace that will survive a system crash
(power outage, kernel panic, etc).  A battery-backed RAM cache is OK,
assuming that total failure of the RAID controller is not one of the
events you consider likely enough to need protection against.

The description of your SATA drive makes it sound like the drive
does not put data on the platter before reporting write complete,
but only stores it in on-board RAM cache.  It is highly unlikely
that there is any battery backing for that cache, and therefore that
drive is not to be trusted.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] x206-x225

2006-03-11 Thread David Lang

On Sat, 11 Mar 2006, Joost Kraaijeveld wrote:


Date: Sat, 11 Mar 2006 09:17:09 +0100
From: Joost Kraaijeveld [EMAIL PROTECTED]
To: David Lang [EMAIL PROTECTED]
Cc: Richard Huxton dev@archonet.com, pgsql-performance@postgresql.org
Subject: Re: [PERFORM] x206-x225

On Fri, 2006-03-10 at 23:57 -0800, David Lang wrote:

On Sat, 11 Mar 2006, Joost Kraaijeveld wrote:


On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote:

Your ATA disk is lying about disk caching being turned off. Assuming
each insert is in a separate transaction, then it's not going to do
10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational
speed.

Could you explain the calculation? Why should the number of transactions
be related to the rotational speed of the disk, without saying anything
about the number of bytes per rotation?


each transaction requires a sync to the disk, a sync requires a real
write (which you then wait for), so you can only do one transaction per
rotation.

Not according to a conversation I had with Western Digital about the
write performance of my own SATA disks. What I understand from their
explanation their disk are limited by the MB/sec and not by the number
of writes/second, e.g. I could write 50 MB/sec *in 1 bit/write* on my
disk. This would suggest that the maximum transactions of my disk
(overhead of OS and PostgreSQL ignored) would be 50MB / (transaction
size in MB) per second. Or am I missing something (what would not
surprise me, as I do not understand the perforance of my system at
all ;-))?


but if you do a 1 bit write, and wait for it to complete, and then do 
another 1 bit write that belongs on disk immediatly after the first one 
(and wait for it to complete) you have to wait until the disk rotates to 
the point that it can make the write before it's really safe on disk.


so you can do one transaction in less then one rotation, but if you do 50 
transactions you must wait at least 49 (and a fraction) roatations.


if the disk cache is turned on then you don't have to wait for this, but 
you also will loose the data if you loose power so it's really not safe.


David Lang

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] x206-x225

2006-03-10 Thread Richard Huxton

H.J. Sanders wrote:

X206   IBM X226
-----
processorPentium 4 3.2 
GhzXeon 3.0 Ghz   
main memory1.25 
GB4 GB   
discs  2 x SCSI RAID1  1RPM  
1 x ATA 7200 RPM


Noting that the SCSI discs are on the *slower* machine.


Time at X206Time at X226
--
insert record (1 to 1)  6 sec.41 sec.
select record (1 to 1)  4  4
delete record (1 to 1)  6 41
 
 
This is ofcourse a totally unexpected results (you should think off the 
opposite).


Your ATA disk is lying about disk caching being turned off. Assuming 
each insert is in a separate transaction, then it's not going to do 
10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational 
speed.



Funny is that the select time is the same for both machines.


Because you're limited by the speed to read from RAM.

By the way - these sort of tests are pretty much meaningless in any 
practical terms.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] x206-x225

2006-03-10 Thread Daniel Blaisdell
The primary slow down is probably between your system bus from main
memory to your disk storage. If you notice from your statistics that
the select statements are very close. This is because all the data you
need is already in system memory. The primary bottle neck is probably
disk I/O. Scsi will always be faster than ATA. Scsi devices have
dedicated hardware for getting data to and from the disc to the main
system bus without requiring a trip through the CPU. 

You may be able to speed up the ata disc by enabling DMA by using hdparm.

hdparm -d1 /dev/hda (or whatever your device is)

-Daniel
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQBEEYzX9SJ2nhowvKERAoiFAKCLR+7a7ReZ2mjjPjpONHLGIQD1SgCeNNON
V1kbyATIFVPWuf1W6Ji0IFg=
=5Msr
-END PGP SIGNATURE-
On 3/10/06, Richard Huxton dev@archonet.com wrote:
H.J. Sanders wrote:
X206
IBM X226
-
processorPentium
4 3.2
GhzXeon
3.0 Ghz main
memory1.25
GB4
GB
discs2
x SCSI RAID11RPM 1 x ATA 7200 RPMNoting that the SCSI discs are on the *slower* machine. Time at X206Time at X226 -- insert record (1 to 1)6 sec.41 sec.

select record (1 to
1)44
delete record (1 to
1)6
41 This is ofcourse a totally unexpected results (you should think off the opposite).Your ATA disk is lying about disk caching being turned off. Assumingeach insert is in a separate transaction, then it's not going to do
10,000 / 6 = 1667 transactions/sec - that's faster than it's rotationalspeed. Funny is that the select time is the same for both machines.Because you're limited by the speed to read from RAM.
By the way - these sort of tests are pretty much meaningless in anypractical terms.-- Richard Huxton Archonet Ltd---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] x206-x225

2006-03-10 Thread Joost Kraaijeveld
On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote:
 Your ATA disk is lying about disk caching being turned off. Assuming 
 each insert is in a separate transaction, then it's not going to do 
 10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational 
 speed.
Could you explain the calculation? Why should the number of transactions
be related to the rotational speed of the disk, without saying anything
about the number of bytes per rotation?

-- 
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
e-mail: [EMAIL PROTECTED]
web: www.askesis.nl


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] x206-x225

2006-03-10 Thread David Lang

On Sat, 11 Mar 2006, Joost Kraaijeveld wrote:


On Fri, 2006-03-10 at 13:40 +, Richard Huxton wrote:

Your ATA disk is lying about disk caching being turned off. Assuming
each insert is in a separate transaction, then it's not going to do
10,000 / 6 = 1667 transactions/sec - that's faster than it's rotational
speed.

Could you explain the calculation? Why should the number of transactions
be related to the rotational speed of the disk, without saying anything
about the number of bytes per rotation?


each transaction requires a sync to the disk, a sync requires a real 
write (which you then wait for), so you can only do one transaction per 
rotation.


David Lang

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq