Re: [HACKERS] Page replacement algorithm in buffer cache

Andres Freund Mon, 01 Apr 2013 14:09:47 -0700

On 2013-04-01 08:28:13 -0500, Merlin Moncure wrote:
> On Sun, Mar 31, 2013 at 1:27 PM, Jeff Janes <[email protected]> wrote:
> > On Friday, March 22, 2013, Ants Aasma wrote:
> >>
> >> On Fri, Mar 22, 2013 at 10:22 PM, Merlin Moncure <[email protected]>
> >> wrote:
> >> > well if you do a non-locking test first you could at least avoid some
> >> > cases (and, if you get the answer wrong, so what?) by jumping to the
> >> > next buffer immediately.  if the non locking test comes good, only
> >> > then do you do a hardware TAS.
> >> >
> >> > you could in fact go further and dispense with all locking in front of
> >> > usage_count, on the premise that it's only advisory and not a real
> >> > refcount.  so you only then lock if/when it's time to select a
> >> > candidate buffer, and only then when you did a non locking test first.
> >> >  this would of course require some amusing adjustments to various
> >> > logical checks (usage_count <= 0, heh).
> >>
> >> Moreover, if the buffer happens to miss a decrement due to a data
> >> race, there's a good chance that the buffer is heavily used and
> >> wouldn't need to be evicted soon anyway. (if you arrange it to be a
> >> read-test-inc/dec-store operation then you will never go out of
> >> bounds) However, clocksweep and usage_count maintenance is not what is
> >> causing contention because that workload is distributed. The issue is
> >> pinning and unpinning.
> >
> >
> > That is one of multiple issues.  Contention on the BufFreelistLock is
> > another one.  I agree that usage_count maintenance is unlikely to become a
> > bottleneck unless one or both of those is fixed first (and maybe not even
> > then)
> 
> usage_count manipulation is not a bottleneck but that is irrelevant.
> It can be affected by other page contention which can lead to priority
> inversion.  I don't be believe there is any reasonable argument that
> sitting and spinning while holding the BufFreelistLock is a good idea.


In my experience the mere fact of (unlockedly, but still) accessing all the
buffer headers can cause noticeable slowdowns in write only/mostly workloads 
with
big amounts of shmem.
Due to the write only nature large amounts of the buffers have a similar
usagecounts (since they are infrequently touched after the initial insertion)
and there are no free ones around so the search for a buffer frequently runs
through *all* buffer headers multiple times till it decremented all usagecounts
to 0. Then comes a period where free buffers are found easily (since all
usagecounts from the current sweep point onwards are zero). After that it
starts all over.
I now have seen that scenario multiple times :(

Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

Reply via email to