Re: [HACKERS] Seq scans roadmap

CK Tan Thu, 10 May 2007 11:39:25 -0700

Sorry, 16x8K page ring is too small indeed. The reason we selected 16is because greenplum db runs on 32K page size, so we are indeedreading 128K at a time. The #pages in the ring should be maderelative to the page size, so you achieve 128K per read.

Also agree that KillAndReadBuffer could be split into aKillPinDontRead(), and ReadThesePinnedPages() functions. However, weare thinking of AIO and would rather see a ReadNPagesAsync() function.


-cktan
Greenplum, Inc.

On May 10, 2007, at 3:14 AM, Zeugswetter Andreas ADI SD wrote:

In reference to the seq scans roadmap, I have just submitted
a patch that addresses some of the concerns.

The patch does this:

1. for small relation (smaller than 60% of bufferpool), use
the current logic 2. for big relation:
        - use a ring buffer in heap scan
        - pin first 12 pages when scan starts
        - on consumption of every 4-page, read and pin the next 4-page
        - invalidate used pages of in the scan so they do not
force out other useful pages


A few comments regarding the effects:

I do not see how this speedup could be caused by readahead, so whatare

the effects ?

(It should make no difference to do the CPU work for count(*)inbetween

reading each block when the pages are not dirtied)
Is the improvement solely reduced CPU because no search for a free
buffer is needed and/or L2 cache locality ?

What effect does the advance pinnig have, avoid vacuum ?

A 16 x 8k page ring is too small to allow the needed IO blocksize of
256k.
The readahead is done 4 x one page at a time (=32k).
What is the reasoning behind 1/4 ring for readahead (why not 1/2), is
3/4 the trail for followers and bgwriter ?

I think in anticipation of doing a single IO call for more that one

page, the KillAndReadBuffer function should be split into twoparts. One

that does the killing
for n pages, and one that does the reading for n pages.
Killing n before reading n would also have the positive effect of
grouping perhaps needed writes (not interleaving them with the reads).

I think the 60% Nbuffers is a very good starting point. I would only
introduce a GUC when we see evidence that it is needed (I agree with
Simon's partitioning comments, but I'd still wait and see).

Andreas




---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

               http://www.postgresql.org/about/donate

Re: [HACKERS] Seq scans roadmap

Reply via email to