> Sorry, 16x8K page ring is too small indeed. The reason we
> selected 16 is because greenplum db runs on 32K page size, so
> we are indeed reading 128K at a time. The #pages in the ring
> should be made relative to the page size, so you achieve 128K
> per read.
Ah, ok. New disks here also have a peak at 128k with no other concurrent
Writes benefit from larger blocksizes though, 512k and more.
Reads with other concurrent IO might also benefit from larger
Comment to all: to test optimal blocksizes make sure you have other
concurrent IO on the disk.
> Also agree that KillAndReadBuffer could be split into a
> KillPinDontRead(), and ReadThesePinnedPages() functions.
> However, we are thinking of AIO and would rather see a
> ReadNPagesAsync() function.
Yes, you could start the aio and return an already read buffer to allow
concurrent cpu work.
However, you would still want to do blocked aio_readv calls to make sure
the physical read uses the large blocksize.
So I'd say aio would benefit from the same split.
In another posting you wrote:
> The patch has no effect on scans that do updates.
> The KillAndReadBuffer routine does not force out a buffer if
> the dirty bit is set. So updated pages revert to the current
> performance characteristics.
Yes I see, the ring slot is replaced by a standard ReadBuffer in that
case, looks good.
I still think it would be better to write out the buffers and keep them
in the ring when possible, but that seems to need locks and some sort of
synchronization with the new walwriter, so looks like a nice project for
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not