Re: [HACKERS] Clock with Adaptive Replacement

Robert Haas Tue, 01 May 2018 11:59:38 -0700

On Sat, Apr 28, 2018 at 7:16 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> There could very easily be an enormous range of usefulness among
> buffers in shared_buffers. And yet, we only recognize the increments
> that usage_count can represent (this is before we even think about the
> actual problems with how usage_count is maintained). We're never going
> to be able to give *radically* more useful leaf pages a concomitant
> advantage during buffer eviction if we have to use some rule of thumb.


Right.  I think the real problem here is that the usage_count is a
terrible predictor of how useful a page really is...

> In a way, it's not at all surprising that we can end up with
> usage_counts of 5 all around when shared_buffers is large. Blocks end
> up "resting on their laurels"; they're impossible to evict based on a
> large amount of blind luck, such as being involved in a burst of
> activity some time ago (so much for clocksweep approximating an LRU;
> this is actually more like an LFU). ARC/CAR moves a "recent" block to
> the "frequent" list if the block gets referenced a second time after
> an interval that is, in effect, self-correcting (it may or may not be
> from the ghost recent list or the real buffer cache recent list --
> either way it goes to the head of the frequent list). Moving a block
> from the recent to the frequent list also enlarges the recent list
> target size at the expense of the frequent list target size. "This
> block in particular deserves to be a frequent block, but as a
> consequence all blocks should collectively be considered slightly less
> deserving of being frequent blocks."

...and this is exactly the kind of system that can fix that problem.

In the current system, bumping the usage count makes the buffer whose
count just got bumped look more useful without making any other buffer
look less useful.  Obviously, that allows for situations where the
usefulness of every buffer converges to positive infinity or, as we
like to call it, 5.  Real life isn't like that, though.  Not all
children are above average, and not all buffers can be more useful
than average.

That's not to say that the total of all usage counts must remain equal
to a constant or something dumb like that -- there's decisions to be
made in terms of how you implement things.  For example, you can
imagine a system where every time we would bump the usage count of a
buffer that already has a usage count of 5, we would instead advance
the clock hand by 1 buffer.  In such a system the total of the usage
counts can fluctuate, but you can't peg everything at 5 because
there's back-pressure built into the algorithm.  CAR accomplishes a
similar goal in a different way: "frequently" used pages are never
evicted, but as more and more pages get moved into the frequently-used
set, the system becomes more and more willing to shrink the set size.

That's a really appealing property.  If there's a "hot" set of buffers
that accounts for nearly all accesses, the algorithm will be willing
to regard that all buffers in that set as frequently-used regardless
of whether it is a large or a small percentage of shared_buffers, and
they will never get evicted.  But the buffers must really be hot
enough to justify the space they're taking up.  When there are only a
few hot buffers, it's sufficient for them to be a little hotter than
the typical buffer.  But when there are a lot of hot buffers, they
have to be *really* hot.  Intuitively, that makes sense, because
making the set of "hot" -- and thus unevictable -- buffers large means
that other buffers are going to get evicted very quickly.  That's only
justifiable if the hot buffers are very significantly more useful than
the non-hot buffers.

> If you have extreme cache pressure, I don't think that there is
> anything left to worry about other than the scalability of evicting
> buffers. That should be a rare enough occurrence for well maintained
> systems these days (again, this may only actually be true as an
> aspiration). Just look at the ARC paper's statistics; they report that
> they're not much better than the competition (including even a classic
> LRU) when there is either mild cache pressure or extreme cache
> pressure. The middle ground is the interesting part.

+1.

> All of that having been said, maybe we have an independent low-level
> problem: we increase usage_count when we pin a buffer, even if we last
> pinned the buffer 5ms ago. IIRC a change to this went in when ARC went
> in (and came out with ARC, too). This is more of a problem with
> miscounting the number of hits based on accidents around how things
> are structured in the executor -- we need to be more careful about
> recognizing whether or not block hits are logically distinct.
> Particularly with stuff like nested loop joins.

I agree that double-counting correlated accesses is a a problem, and I
agree that we probably want to do something about it.  I am skeptical
that using wall-clock time is the right way to solve that problem
because it leads to edge effects -- if for example there is a 5ms
timeout for bumping the usage count again, then you can probably
manufacture a workload where the buffer eviction pattern is
dramatically different for a nested loop that hits a given page every
4.5ms than it is for a nested loop that hits a given page every 5.5ms.
Note that CART aims to solve this problem in a way that doesn't
involve any fixed amount of wall-clock time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Clock with Adaptive Replacement

Reply via email to