On Jun 4, 2007, at 1:56 PM, Markus Schiltknecht wrote:
Simplistic throughput testing with dd:

dd of=test if=/dev/zero bs=10K count=800000
800000+0 records in
800000+0 records out
8192000000 bytes (8.2 GB) copied, 37.3552 seconds, 219 MB/s
pamonth:/opt/dbt2/bb# dd if=test of=/dev/zero bs=10K count=800000
800000+0 records in
800000+0 records out
8192000000 bytes (8.2 GB) copied, 27.6856 seconds, 296 MB/s

I don't think that kind of testing is useful for good raid controllers on RAID5/6, because the controller will just be streaming the data out; it'll compute the parity blocks on the fly and just stream data to the drives as fast as possible.

But that's not how writes in the database work (except for WAL); you're writing stuff all over the place, none of which is streamed. So in the best case (the entire stripe being updated is in the controller's cache), at a minimum it's going to have to write data + parity ( * 2 for RAID 6, IIRC) for every write. But any real-sized database is going to be far larger than your raid cache, which means there's a good chance a block being written will no longer have it's stripe in cache. In that case, the controller is going to have to read a bunch of data back off the drive, which is going to clobber performance.

Now, add that performance bottleneck on top of your WAL writes and you're in real trouble.

BTW, I was thinking in terms of stripe size when I wrote this, but I don't know if good controllers actually need to deal with things at a stripe level, or if they can deal with smaller chunks of a stripe. In either case, the issue is still the number of extra reads going on.
--
Jim Nasby                                            [EMAIL PROTECTED]
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to