Tom Lane wrote:
I just had an epiphany, I think.

As I wrote in the LDC discussion,
http://archives.postgresql.org/pgsql-patches/2007-06/msg00294.php
if the bgwriter's LRU-cleaning scan has advanced ahead of freelist.c's
clock sweep pointer, then any buffers between them are either clean,
or are pinned and/or have usage_count > 0 (in which case the bgwriter
wouldn't bother to clean them, and freelist.c wouldn't consider them
candidates for re-use).  And *this invariant is not destroyed by the
activities of other backends*.  A backend cannot dirty a page without
raising its usage_count from zero, and there are no race cases because
the transition states will be pinned.

This means that there is absolutely no point in having the bgwriter
re-start its LRU scan from the clock sweep position each time, as
it currently does.  Any pages it revisits are not going to need
cleaning.  We might as well have it progress forward from where it
stopped before.

All true this far.

Note that Itagaki-san's patch changes that though. With the patch, the LRU scan doesn't look for bgwriter_lru_maxpages dirty buffers to write. Instead, it checks that there's N (where N varies based on history) clean buffers with usage_count=0 in front of the clock sweep. If there isn't, it writes dirty buffers until there is again.

In fact, the notion of the bgwriter's cleaning scan being "in front of"
the clock sweep is entirely backward.  It should try to be behind the
sweep, ie, so far ahead that it's lapped the clock sweep and is trailing
along right behind it, cleaning buffers immediately after their
usage_count falls to zero.  All the rest of the buffer arena is either
clean or has positive usage_count.

Really? How much of the buffer cache do you think we should try to keep clean? And how large a percentage of the buffer cache do you think have usage_count=0 at any given point in time? I'm not sure myself, but as a data point the usage counts on a quick DBT-2 test on my laptop look like this:

 usagecount | count
------------+-------
          0 |  1107
          1 |  1459
          2 |   459
          3 |   235
          4 |   352
          5 |   481
            |     3

NBuffers = 4096.

That will vary widely depending on your workload, of course, but keeping 1/4 of the buffer cache clean seems like overkill to me. If any of those buffers are re-dirtied after we write them, the write was a waste of time.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Reply via email to