Robert Haas <robertmh...@gmail.com> writes:
> On Wed, Dec 21, 2011 at 11:48 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>> I'm inclined to think that that specific arrangement wouldn't be good.
>> The normal access pattern for CLOG is, I believe, an exponentially
>> decaying probability-of-access for each page as you go further back from
>> current. ... for instance the next-to-latest
>> page could end up getting removed while say the third-latest page is
>> still there because it's in a different associative bucket that's under
>> less pressure.

> Well, sure.  But who is to say that's bad?  I think you can find a way
> to throw stones at any given algorithm we might choose to implement.

The point I'm trying to make is that buffer management schemes like
that one are built on the assumption that the probability of access is
roughly uniform for all pages.  We know (or at least have strong reason
to presume) that CLOG pages have very non-uniform probability of access.
The straight LRU scheme is good because it deals well with non-uniform
access patterns.  Dividing the buffers into independent buckets in a way
that doesn't account for the expected access probabilities is going to
degrade things.  (The approach Simon suggests nearby seems isomorphic to
yours and so suffers from this same objection, btw.)

> For example, if you contrive things so that you repeatedly access the
> same old CLOG pages cyclically: 1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,...

Sure, and the reason that that's contrived is that it flies in the face
of reasonable assumptions about CLOG access probabilities.  Any scheme
will lose some of the time, but you don't want to pick a scheme that is
more likely to lose for more probable access patterns.

It strikes me that one simple thing we could do is extend the current
heuristic that says "pin the latest page".  That is, pin the last K
pages into SLRU, and apply LRU or some other method across the rest.
If K is large enough, that should get us down to where the differential
in access probability among the older pages is small enough to neglect,
and then we could apply associative bucketing or other methods to the
rest without fear of getting burnt by the common usage pattern.  I don't
know what K would need to be, though.  Maybe it's worth instrumenting
a benchmark run or two so we can get some facts rather than guesses
about the access frequencies?

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to