Sorry, 16x8K page ring is too small indeed. The reason we selected 16 is because greenplum db runs on 32K page size, so we are indeed reading 128K at a time. The #pages in the ring should be made relative to the page size, so you achieve 128K per read.

Also agree that KillAndReadBuffer could be split into a KillPinDontRead(), and ReadThesePinnedPages() functions. However, we are thinking of AIO and would rather see a ReadNPagesAsync() function.

Greenplum, Inc.

On May 10, 2007, at 3:14 AM, Zeugswetter Andreas ADI SD wrote:

In reference to the seq scans roadmap, I have just submitted
a patch that addresses some of the concerns.

The patch does this:

1. for small relation (smaller than 60% of bufferpool), use
the current logic 2. for big relation:
        - use a ring buffer in heap scan
        - pin first 12 pages when scan starts
        - on consumption of every 4-page, read and pin the next 4-page
        - invalidate used pages of in the scan so they do not
force out other useful pages

A few comments regarding the effects:

I do not see how this speedup could be caused by readahead, so what are
the effects ?
(It should make no difference to do the CPU work for count(*) inbetween
reading each block when the pages are not dirtied)
Is the improvement solely reduced CPU because no search for a free
buffer is needed and/or L2 cache locality ?

What effect does the advance pinnig have, avoid vacuum ?

A 16 x 8k page ring is too small to allow the needed IO blocksize of
The readahead is done 4 x one page at a time (=32k).
What is the reasoning behind 1/4 ring for readahead (why not 1/2), is
3/4 the trail for followers and bgwriter ?

I think in anticipation of doing a single IO call for more that one
page, the KillAndReadBuffer function should be split into two parts. One
that does the killing
for n pages, and one that does the reading for n pages.
Killing n before reading n would also have the positive effect of
grouping perhaps needed writes (not interleaving them with the reads).

I think the 60% Nbuffers is a very good starting point. I would only
introduce a GUC when we see evidence that it is needed (I agree with
Simon's partitioning comments, but I'd still wait and see).


---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at


Reply via email to