Tom Lane wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> > On Sun, 2003-10-05 at 19:50, Neil Conway wrote:
> > I was hoping you'd reply to this, Tom -- you were referring to O_DIRECT,
> > right?
> Not necessarily --- as you point out, it's not clear that O_DIRECT would
> help us.  What would be way cool is something similar to what James
> Rogers was talking about: a way to tell the kernel not to promote this
> page all the way to the top of its LRU list.  I'm not sure that *any*
> Unixen have such an API, let alone one that's common across more than
> one platform :-(

Solaris has "free-behind", which prevents a large kernel sequential scan
from blowing out the cache.

I only read about it in the Mauro Solaris Internals book, and it seems
to be done automatically.  I guess most OS's don't do this optimization
because they usually don't read files larger than their cache.

I see BSD/OS madvise() has:

     #define MADV_NORMAL     0       /* no further special treatment */
     #define MADV_RANDOM     1       /* expect random page references */
     #define MADV_SEQUENTIAL 2       /* expect sequential references */
     #define MADV_WILLNEED   3       /* will need these pages */
-->  #define MADV_DONTNEED   4       /* don't need these pages */
     #define MADV_SPACEAVAIL 5       /* insure that resources are reserved */

The marked one seems to have the control we need.  Of course, the kernel
madvise() code has:

        /* Not yet implemented */

Looks like NetBSD implements it, but it also unmaps the page from the
address space, which might be more than we want.  NetBSD alao has:

     #define MADV_FREE       6       /* pages are empty, free them */

which frees the page.  I am unclear on its us.

FreeBSD has this comment:

 * vm_page_dontneed
 *  Cache, deactivate, or do nothing as appropriate.  This routine
 *  is typically used by madvise() MADV_DONTNEED.
 *  Generally speaking we want to move the page into the cache so
 *  it gets reused quickly.  However, this can result in a silly syndrome
 *  due to the page recycling too quickly.  Small objects will not be
 *  fully cached.  On the otherhand, if we move the page to the inactive
 *  queue we wind up with a problem whereby very large objects
 *  unnecessarily blow away our inactive and cache queues.
 *  The solution is to move the pages based on a fixed weighting.  We
 *  either leave them alone, deactivate them, or move them to the cache,
 *  where moving them to the cache has the highest weighting.
 *  By forcing some pages into other queues we eventually force the
 *  system to balance the queues, potentially recovering other unrelated
 *  space from active.  The idea is to not force this to happen too
 *  often.

The Linux comment is:

 * Application no longer needs these pages.  If the pages are dirty,
 * it's OK to just throw them away.  The app will be more careful about
 * data it wants to keep.  Be sure to free swap resources too.  The
 * zap_page_range call sets things up for refill_inactive to actually free
 * these pages later if no one else has touched them in the meantime,
 * although we could add these pages to a global reuse list for
 * refill_inactive to pick up before reclaiming other pages.
 * NB: This interface discards data rather than pushes it out to swap,
 * as some implementations do.  This has performance implications for
 * applications like large transactional databases which want to discard
 * pages in anonymous maps after committing to backing store the data
 * that was kept in them.  There is no reason to write this data out to
 * the swap area if the application is discarding it.
 * An interface that causes the system to free clean pages and flush
 * dirty pages is already available as msync(MS_INVALIDATE).

It seems mmap is more for controlling the memory mapping of files rather
than controlling the cache itself.

  Bruce Momjian                        |
  [EMAIL PROTECTED]               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Reply via email to