Re: [HACKERS] Clock with Adaptive Replacement

Thomas Munro Sat, 28 Apr 2018 14:42:51 -0700

On Sun, Apr 29, 2018 at 2:39 AM, Stephen Frost <sfr...@snowman.net> wrote:
> * Peter Geoghegan (p...@bowt.ie) wrote:
>> On Thu, Apr 26, 2018 at 10:39 AM, Robert Haas <robertmh...@gmail.com> wrote:
>> > I'm personally not very excited about making rules like "index pages
>> > are more valuable than heap pages".  Such rules will in many cases be
>> > true, but it's easy to come up with cases where they don't hold: for
>> > example, we might run pgbench for a while and then stop running
>> > pgbench and start running big sequential scans for reporting purposes.
>>
>> I am not in favor of this general approach. Actually, I'm willing to
>> come out against it strongly right now: let's not teach the buffer
>> manager to distinguish between the AM of a block.
>
> Perhaps I'm misunderstanding the case here, but with the ring buffer we
> use for sequential access, aren't we already discouraging keeping heap
> pages around, particularly when they're coming from a sequential scan?
>
> I'm not suggesting that's a bad thing either, in fact it seems like a
> generally good thing and I don't think we get many complaints about it,
> I'm just trying to point out that we already have a distinction between
> heap-pages-from-seq-scans and index pages (and heap pages from using
> those indexes) and therefore I'm not convinced by an argument that we
> shouldn't ever treat pages from the heap differently than pages from the
> indexes.


I think the idea is that CAR might be even better than our hard coded
BAS_BULKREAD, BAS_BULKWRITE, BAS_VACUUM rings because it has a kind of
dynamic self-tuning scan resistance.  It works by having one clock  (=
a circular list, not just a rotating hand in an array like our current
scheme) T1 that holds buffers that were touched once or twice, to
absorb scans.  It'll reclaim T1 buffers that were only accessed once
(but keep the descriptor for longer -- see below).  It will move
things to T2, a separate clock for valuable pages, after they've been
accessed twice in recent memory.  It dynamically adjusts the sizes of
T1 and T2 in a way that is supposedly beneficial.

[...]

>> > I believe the root of the problem here is that the usage count we have
>> > today does a very poor job distinguishing what's hot from what's not.
>>
>> I think that the root of the problem is that we don't remember
>> anything about evicted buffers. The same block can bounce in and out
>> of shared_buffers repeatedly, and we don't recognize that. There are
>> probably second-order effects from sizing shared_buffers. We
>> needlessly tie the number of buffers to our capacity to remember
>> things about block popularity. That would be bad if we had no FS
>> cache, but with the FS cache it seems awful.
>
> This is definitely an interesting and valuable point.  Having pages
> bouncing in and out of shared buffers means that we aren't properly
> tracking their value.  Just brainstorming, but I wonder if there's a way
> we could track their value even when we've evicted them, without slowing
> down the entire system.  Not having any great ideas off-hand for that
> but some kind of "we last accessed this block an hour ago" and "we last
> accessed this block half a millisecond ago" might be interesting to try
> to avoid such ping-ponging.  I wonder if there are known strategies for
> addressing this, perhaps by having a data structure which effectively
> tracks hit rates for all potential blocks...

That's what the CAR algorithm does.  It has twice as many
BufferDescriptors as buffers.  As well as the T1 and T2 clocks that
hold buffers, it also has doubly linked lists B1 and B2 which hold
BufferDescriptors with no buffers for things that recently fell out of
T1 and T2 to extend the algorithm's memory.  For example, if there is
a "miss" in B1 (meaning we found the page we're looking for in there
but it currently has no buffer, and it was recently evicted from T1),
then we can immediately find a buffer for it in T2 (it qualifies as
having been accessed twice in recent memory, and doesn't have to go
through T1 again where it risks early eviction).  Though of course it
too is finite.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] Clock with Adaptive Replacement

Reply via email to