Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

Bruce Momjian Thu, 17 Apr 2014 07:49:25 -0700

On Thu, Apr 17, 2014 at 10:40:40AM -0400, Robert Haas wrote:
> On Thu, Apr 17, 2014 at 10:32 AM, Bruce Momjian <br...@momjian.us> wrote:
> > On Thu, Apr 17, 2014 at 10:18:43AM -0400, Robert Haas wrote:
> >> I also believe this to be the case on first principles and my own
> >> experiments.  Suppose you have a workload that fits inside
> >> shared_buffers.  All of the usage counts will converge to 5.  Then,
> >> somebody accesses a table that is not cached, so something's got to be
> >> evicted.  Because all the usage counts are the same, the eviction at
> >> this point is completely indiscriminate.  We're just as likely to kick
> >> out a btree root page or a visibility map page as we are to kick out a
> >> random heap page, even though the former have probably been accessed
> >> several orders of magnitude more often.  That's clearly bad.  On
> >> systems that are not too heavily loaded it doesn't matter too much
> >> because we just fault the page right back in from the OS pagecache.
> >> But I've done pgbench runs where such decisions lead to long stalls,
> >> because the page has to be brought back in from disk, and there's a
> >> long I/O queue; or maybe just because the kernel thinks PostgreSQL is
> >> issuing too many I/O requests and makes some of them wait to cool
> >> things down.
> >
> > I understand now.  If there is no memory pressure, every buffer gets the
> > max usage count, and when a new buffer comes in, it isn't the max so it
> > is swiftly removed until the clock sweep has time to decrement the old
> > buffers.  Decaying buffers when there is no memory pressure creates
> > additional overhead and gets into timing issues of when to decay.
> 
> That can happen, but the real problem I was trying to get at is that
> when all the buffers get up to max usage count, they all appear
> equally important.  But in reality they're not.  So when we do start
> evicting those long-resident buffers, it's essentially random which
> one we kick out.


True.  Ideally we would have some way to know that _all_ the buffers had
reached the maximum and kick off a sweep to decrement them all.  I am
unclear how we would do that.  One odd idea would be to have a global
counter that is incremented everytime a buffer goes from 4 to 5 (max)
--- when the counter equals 50% of all buffers, do a clock sweep.  Of
course, then the counter becomes a bottleneck.

-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

Reply via email to