Tom Lane wrote: > ... > Curt Sampson <[EMAIL PROTECTED]> writes: > > 3. Proof by testing. I wrote a little ruby program to seek to a > > random point in the first 2 GB of my raw disk partition and read > > 1-8 8K blocks of data. (This was done as one I/O request.) (Using > > the raw disk partition I avoid any filesystem buffering.) > > And also ensure that you aren't testing the point at issue. > The point at issue is that *in the presence of kernel read-ahead* > it's quite unclear that there's any benefit to a larger request size. > Ideally the kernel will have the next block ready for you when you > ask, no matter what the request is. > ...
I have to agree with Tom. I think the numbers below show that with kernel read-ahead, block size isn't an issue. The big_file1 file used below is 2.0 gig of random data, and the machine has 512 mb of main memory. This ensures that we're not just getting cached data. foreach i (4k 8k 16k 32k 64k 128k) echo $i time dd bs=$i if=big_file1 of=/dev/null end and the results: bs user kernel elapsed 4k: 0.260 7.740 1:27.25 8k: 0.210 8.060 1:30.48 16k: 0.090 7.790 1:30.88 32k: 0.060 8.090 1:32.75 64k: 0.030 8.190 1:29.11 128k: 0.070 9.830 1:28.74 so with kernel read-ahead, we have basically the same elapsed (wall time) regardless of block size. Sure, user time drops to a low at 64k blocksize, but kernel time is increasing. You could argue that this is a contrived example, no other I/O is being done. Well I created a second 2.0g file (big_file2) and did two simultaneous reads from the same disk. Sure performance went to hell but it shows blocksize is still irrelevant in a multi I/O environment with sequential read-ahead. foreach i ( 4k 8k 16k 32k 64k 128k ) echo $i time dd bs=$i if=big_file1 of=/dev/null & time dd bs=$i if=big_file2 of=/dev/null & wait end bs user kernel elapsed 4k: 0.480 8.290 6:34.13 bigfile1 0.320 8.730 6:34.33 bigfile2 8k: 0.250 7.580 6:31.75 0.180 8.450 6:31.88 16k: 0.150 8.390 6:32.47 0.100 7.900 6:32.55 32k: 0.190 8.460 6:24.72 0.060 8.410 6:24.73 64k: 0.060 9.350 6:25.05 0.150 9.240 6:25.13 128k: 0.090 10.610 6:33.14 0.110 11.320 6:33.31 the differences in read times are basically in the mud. Blocksize just doesn't matter much with the kernel doing readahead. -Kyle ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org