Re: [HACKERS] our buffer replacement strategy is kind of lame

Jim Nasby Mon, 15 Aug 2011 16:27:12 -0700

On Aug 13, 2011, at 3:40 PM, Greg Stark wrote:
> It does kind of seem like your numbers indicate we're missing part of
> the picture though. The idea with the clock sweep algorithm is that
> you keep approximately 1/nth of the buffers with each of the n values.
> If we're allowing nearly all the buffers to reach a reference count of
> 5 then you're right that we've lost any information about which
> buffers have been referenced most recently.


One possible missing piece here is that OS clock-sweeps depend on the clock 
hand to both increment and decrement the usage count. The hardware sets a bit 
any time a page is accessed; as the clock sweeps in increases usage count if 
the bit is set and decreases it if it's clear. I believe someone else in the 
thread suggested this, and I definitely think it's worth an experiment. 
Presumably this would also ease some lock contention issues.

There is another piece that might be relevant... many (most?) OSes keep 
multiple lists of pages. FreeBSD for example contains these page lists 
(http://www.freebsd.org/doc/en/articles/vm-design/article.html). Full 
description follows, but I think the biggest take-away is that there is a 
difference in how pages are handled once they are no longer active based on 
whither the page is dirty or not.

Active: These pages are actively in use and are not currently under 
consideration for eviction. This is roughy equivalent to all of our buffers 
with a usage count of 5.

When an active page's usage count drops to it's minimum value, it will get 
unmapped from process space and moved to one of two queues:

Inactive: DIRTY pages that are eligible for eviction once they've been written 
out.

Cache: CLEAN pages that may be immediately reclaimed

Free: A small set of pages that are basically the tail of the Cache list. The 
OS *must* maintain some pages on this list to support memory needed during 
interrupt handling. The size of this list is typically kept very small, and I'm 
not sure if non-interrupt processing will pull from this list.

It's important to note that the OS can pull a page back out of the Inactive and 
Cache lists back into Active very cheaply.

I think there are two interesting points here. First: after a page has been 
determined to no longer be in active use it goes into inactive or cache based 
on whether it's dirty. ISTM that allows for much better scheduling of the 
flushing of dirty pages. That said; I'm not sure how much that would help us 
due to checkpoint requirements.

Second: AFAIK only the Active list has a clock sweep. I believe the others are 
LRU (the mentioned URL refers to them as queues). I believe this works well 
because if a page faults it just needs to be removed from whichever queue it is 
in, added to the Active queue, and mapped back into process space.
--
Jim C. Nasby, Database Architect                   j...@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] our buffer replacement strategy is kind of lame

Reply via email to