Re: svn commit: r288431 - in head/sys: kern sys vm

2015-10-08 Thread John-Mark Gurney
Alan Cox wrote this message on Fri, Oct 02, 2015 at 18:50 -0500:
> On Oct 2, 2015, at 10:59 AM, John Baldwin  wrote:
> 
> > I think it is not unreasonble to expect that fadvise() incurs system-wide
> > affects.  A properly implemented WILLNEED that does read-ahead cannot work
> > without incurring system-wide effects.  I had always assumed that fadvise()
> > operated on a file, not a given process' view of a file (unlike, say,
> > madvise which only operates on mappings and only indirectly affects
> > file-backed data).
> 
> 
> Can you elaborate on what you mean by ?I had always assumed that fadvise() 
> operated on a file, ???
> 
> Under the previous implementation, if you did an fadvise(DONTNEED) on a file, 
> in order to cache the file?s pages, those pages first had to be unmapped from 
> any address space.  (You can find this unmapping performed by 
> vm_page_try_to_cache().)  In other words, there was never any code that said, 
> ?Is this a mapped page, and if it is, don?t cache it because we?re actually 
> performing an fadvise().?  So, to pick an extreme example, if you did an 
> fadvise(?libc.so?, DONTNEED), unless some process had libc.so wired, then 
> every single mapping to every single page of libc.so was going to be 
> destroyed and the pages moved to the cache.  However, because we moved the 
> pages to the cache (rather than freeing them), and libc.so is frequently 
> accessed, a subsequent instruction fetch would have faulted and been able to 
> reactivate the cached page, avoiding an I/O operation.  In other words, that 
> we were caching the pages targeted by fadvise() rather than simply freeing 
> them mattered in cases where th
 e pages were in use/accessed by multiple processes.

This would be a very nasty DoS if someone just ran fadvise('libc.so',
DONTNEED) in loop, and forced any future accesses of libc.so to pull
from disk, over and over and over again...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 "All that I will do, has been done, All that I have, has not."
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r288431 - in head/sys: kern sys vm

2015-10-02 Thread John Baldwin
On Thursday, October 01, 2015 09:58:43 PM Mark Johnston wrote:
> On Thu, Oct 01, 2015 at 09:32:45AM -0700, John Baldwin wrote:
> > On Wednesday, September 30, 2015 11:06:30 PM Mark Johnston wrote:
> > > Author: markj
> > > Date: Wed Sep 30 23:06:29 2015
> > > New Revision: 288431
> > > URL: https://svnweb.freebsd.org/changeset/base/288431
> > > 
> > > Log:
> > >   As a step towards the elimination of PG_CACHED pages, rework the 
> > > handling
> > >   of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved 
> > > to
> > >   the head of the inactive queue instead of being cached.
> > >   
> > >   This affects the implementation of POSIX_FADV_NOREUSE as well, since it
> > >   works by applying POSIX_FADV_DONTNEED to file ranges after they have 
> > > been
> > >   read or written.  At that point the corresponding buffers may still be
> > >   dirty, so the previous implementation would coalesce successive ranges 
> > > and
> > >   apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the
> > >   dirty buffers would eventually be cached.  To preserve this behaviour 
> > > in an
> > >   efficient manner, this change adds a new buf flag, B_NOREUSE, which 
> > > causes
> > >   the pages backing a VMIO buf to be placed at the head of the inactive 
> > > queue
> > >   when the buf is released.  POSIX_FADV_NOREUSE then works by setting this
> > >   flag in bufs that underlie the specified range.
> > 
> > Putting these pages back on the inactive queue completely defeats the 
> > primary
> > purpose of DONTNEED and NOREUSE.  The primary purpose is to move the pages 
> > out
> > of the VM object's tree of pages and into the free pool so that the 
> > application
> > can instruct the VM to free memory more efficiently than relying on page 
> > daemon.
> > 
> > The implementation used cache pages instead of free as a cheap optimization 
> > so
> > that if an application did something dumb where it used DONTNEED and then 
> > turned
> > around and read the file it would not have to go to disk if the pages had 
> > not
> > yet been reused.  In practice this didn't work out so well because PG_CACHE 
> > pages
> > don't really work well.
> > 
> > However, using PG_CACHE was secondary to the primary purpose of explicitly 
> > freeing
> > memory that an application knew wasn't going to be reused and avoiding the 
> > need
> > for pagedaemon to run at all.  I think this should be freeing the pages 
> > instead of
> > keeping them inactive.  If an application uses DONTNEED or NOREUSE and then 
> > turns
> > around and rereads the file, it generally deserves to have to go to disk 
> > for it.
> 
> A problem with this is that one application's DONTNEED or NOREUSE hint
> would cause every application reading or writing that file to go to
> disk, but posix_fadvise(2) is explicitly intended for applications that
> wish to provide hints about their own access patterns. I realize that
> it's typically used with application-private files, but that's not a
> requirement of the interface. Deactivating (or caching) the backing
> pages generally avoids this problem.

I think it is not unreasonble to expect that fadvise() incurs system-wide
affects.  A properly implemented WILLNEED that does read-ahead cannot work
without incurring system-wide effects.  I had always assumed that fadvise()
operated on a file, not a given process' view of a file (unlike, say,
madvise which only operates on mappings and only indirectly affects
file-backed data).

> > I'm pretty sure I had mentioned this to Alan before.  I believe that the 
> > idea is
> > that pagedaemon should be cheap enough that having it run anyway shouldn't 
> > be an
> > issue, but I'm a bit skeptical of that. :)  Lock contention is always 
> > possible and
> > having DONTNEED/NOREUSE move pages to PG_CACHE avoided lock contention with
> > pagedaemon during application page faults (since pagedaemon potentially 
> > never has
> > to run).
> 
> That's true, but the page queue locking (and the pagedaemon's
> manipulation of the page queue locks) has also become more fine-grained
> since posix_fadvise(2) was added. In particular, from some reading of
> sys/vm in stable/8, inactive queue scans used to be performed with the
> global page queue lock held; it was only dropped to launder dirty pages.
> Now, the page queue lock is split into separate locks for the active and
> inactive page queues, and the pagedaemon drops the inactive queue lock
> for each page in all but a few exceptional cases. Does the optimization
> of freeing or caching DONTNEED pages buy us all that much now?
> 
> Some synthetic testing in which an application writes out many large
> (2G) files and calls posix_fadvise(FADV_DONTNEED) after each one shows
> no significant difference in runtime if the buffer pages are deactivated
> vs. freed. (My test just modifies vfs_vmio_unwire() to treat B_NOREUSE
> identically to B_DIRECT.) Unsurprisingly, I see very little lock
> contention in the latter 

Re: svn commit: r288431 - in head/sys: kern sys vm

2015-10-02 Thread Mark Johnston
On Fri, Oct 02, 2015 at 08:59:33AM -0700, John Baldwin wrote:
> On Thursday, October 01, 2015 09:58:43 PM Mark Johnston wrote:
> > On Thu, Oct 01, 2015 at 09:32:45AM -0700, John Baldwin wrote:
> > > On Wednesday, September 30, 2015 11:06:30 PM Mark Johnston wrote:
> > > > Author: markj
> > > > Date: Wed Sep 30 23:06:29 2015
> > > > New Revision: 288431
> > > > URL: https://svnweb.freebsd.org/changeset/base/288431
> > > > 
> > > > Log:
> > > >   As a step towards the elimination of PG_CACHED pages, rework the 
> > > > handling
> > > >   of POSIX_FADV_DONTNEED so that it causes the backing pages to be 
> > > > moved to
> > > >   the head of the inactive queue instead of being cached.
> > > >   
> > > >   This affects the implementation of POSIX_FADV_NOREUSE as well, since 
> > > > it
> > > >   works by applying POSIX_FADV_DONTNEED to file ranges after they have 
> > > > been
> > > >   read or written.  At that point the corresponding buffers may still be
> > > >   dirty, so the previous implementation would coalesce successive 
> > > > ranges and
> > > >   apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing 
> > > > the
> > > >   dirty buffers would eventually be cached.  To preserve this behaviour 
> > > > in an
> > > >   efficient manner, this change adds a new buf flag, B_NOREUSE, which 
> > > > causes
> > > >   the pages backing a VMIO buf to be placed at the head of the inactive 
> > > > queue
> > > >   when the buf is released.  POSIX_FADV_NOREUSE then works by setting 
> > > > this
> > > >   flag in bufs that underlie the specified range.
> > > 
> > > Putting these pages back on the inactive queue completely defeats the 
> > > primary
> > > purpose of DONTNEED and NOREUSE.  The primary purpose is to move the 
> > > pages out
> > > of the VM object's tree of pages and into the free pool so that the 
> > > application
> > > can instruct the VM to free memory more efficiently than relying on page 
> > > daemon.
> > > 
> > > The implementation used cache pages instead of free as a cheap 
> > > optimization so
> > > that if an application did something dumb where it used DONTNEED and then 
> > > turned
> > > around and read the file it would not have to go to disk if the pages had 
> > > not
> > > yet been reused.  In practice this didn't work out so well because 
> > > PG_CACHE pages
> > > don't really work well.
> > > 
> > > However, using PG_CACHE was secondary to the primary purpose of 
> > > explicitly freeing
> > > memory that an application knew wasn't going to be reused and avoiding 
> > > the need
> > > for pagedaemon to run at all.  I think this should be freeing the pages 
> > > instead of
> > > keeping them inactive.  If an application uses DONTNEED or NOREUSE and 
> > > then turns
> > > around and rereads the file, it generally deserves to have to go to disk 
> > > for it.
> > 
> > A problem with this is that one application's DONTNEED or NOREUSE hint
> > would cause every application reading or writing that file to go to
> > disk, but posix_fadvise(2) is explicitly intended for applications that
> > wish to provide hints about their own access patterns. I realize that
> > it's typically used with application-private files, but that's not a
> > requirement of the interface. Deactivating (or caching) the backing
> > pages generally avoids this problem.
> 
> I think it is not unreasonble to expect that fadvise() incurs system-wide
> affects.  A properly implemented WILLNEED that does read-ahead cannot work
> without incurring system-wide effects.  I had always assumed that fadvise()
> operated on a file, not a given process' view of a file (unlike, say,
> madvise which only operates on mappings and only indirectly affects
> file-backed data).

Well, that's even true of read(): two processes reading the same file
may affect each other if one primes the buffer cache with blocks as the
second process is reading them. DONTNEED and NOREUSE would specifically
pessimize all processes using the file if they were to cause backing
pages to be freed, though.

> 
> > > I'm pretty sure I had mentioned this to Alan before.  I believe that the 
> > > idea is
> > > that pagedaemon should be cheap enough that having it run anyway 
> > > shouldn't be an
> > > issue, but I'm a bit skeptical of that. :)  Lock contention is always 
> > > possible and
> > > having DONTNEED/NOREUSE move pages to PG_CACHE avoided lock contention 
> > > with
> > > pagedaemon during application page faults (since pagedaemon potentially 
> > > never has
> > > to run).
> > 
> > That's true, but the page queue locking (and the pagedaemon's
> > manipulation of the page queue locks) has also become more fine-grained
> > since posix_fadvise(2) was added. In particular, from some reading of
> > sys/vm in stable/8, inactive queue scans used to be performed with the
> > global page queue lock held; it was only dropped to launder dirty pages.
> > Now, the page queue lock is split into separate locks for the active and
> > 

Re: svn commit: r288431 - in head/sys: kern sys vm

2015-10-02 Thread Alan Cox

On Oct 2, 2015, at 10:59 AM, John Baldwin  wrote:

> On Thursday, October 01, 2015 09:58:43 PM Mark Johnston wrote:
>> On Thu, Oct 01, 2015 at 09:32:45AM -0700, John Baldwin wrote:
>>> On Wednesday, September 30, 2015 11:06:30 PM Mark Johnston wrote:
 Author: markj
 Date: Wed Sep 30 23:06:29 2015
 New Revision: 288431
 URL: https://svnweb.freebsd.org/changeset/base/288431
 
 Log:
  As a step towards the elimination of PG_CACHED pages, rework the handling
  of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to
  the head of the inactive queue instead of being cached.
 
  This affects the implementation of POSIX_FADV_NOREUSE as well, since it
  works by applying POSIX_FADV_DONTNEED to file ranges after they have been
  read or written.  At that point the corresponding buffers may still be
  dirty, so the previous implementation would coalesce successive ranges and
  apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the
  dirty buffers would eventually be cached.  To preserve this behaviour in 
 an
  efficient manner, this change adds a new buf flag, B_NOREUSE, which causes
  the pages backing a VMIO buf to be placed at the head of the inactive 
 queue
  when the buf is released.  POSIX_FADV_NOREUSE then works by setting this
  flag in bufs that underlie the specified range.
>>> 
>>> Putting these pages back on the inactive queue completely defeats the 
>>> primary
>>> purpose of DONTNEED and NOREUSE.  The primary purpose is to move the pages 
>>> out
>>> of the VM object's tree of pages and into the free pool so that the 
>>> application
>>> can instruct the VM to free memory more efficiently than relying on page 
>>> daemon.
>>> 
>>> The implementation used cache pages instead of free as a cheap optimization 
>>> so
>>> that if an application did something dumb where it used DONTNEED and then 
>>> turned
>>> around and read the file it would not have to go to disk if the pages had 
>>> not
>>> yet been reused.  In practice this didn't work out so well because PG_CACHE 
>>> pages
>>> don't really work well.
>>> 
>>> However, using PG_CACHE was secondary to the primary purpose of explicitly 
>>> freeing
>>> memory that an application knew wasn't going to be reused and avoiding the 
>>> need
>>> for pagedaemon to run at all.  I think this should be freeing the pages 
>>> instead of
>>> keeping them inactive.  If an application uses DONTNEED or NOREUSE and then 
>>> turns
>>> around and rereads the file, it generally deserves to have to go to disk 
>>> for it.
>> 
>> A problem with this is that one application's DONTNEED or NOREUSE hint
>> would cause every application reading or writing that file to go to
>> disk, but posix_fadvise(2) is explicitly intended for applications that
>> wish to provide hints about their own access patterns. I realize that
>> it's typically used with application-private files, but that's not a
>> requirement of the interface. Deactivating (or caching) the backing
>> pages generally avoids this problem.
> 
> I think it is not unreasonble to expect that fadvise() incurs system-wide
> affects.  A properly implemented WILLNEED that does read-ahead cannot work
> without incurring system-wide effects.  I had always assumed that fadvise()
> operated on a file, not a given process' view of a file (unlike, say,
> madvise which only operates on mappings and only indirectly affects
> file-backed data).
> 


Can you elaborate on what you mean by “I had always assumed that fadvise() 
operated on a file, …”?

Under the previous implementation, if you did an fadvise(DONTNEED) on a file, 
in order to cache the file’s pages, those pages first had to be unmapped from 
any address space.  (You can find this unmapping performed by 
vm_page_try_to_cache().)  In other words, there was never any code that said, 
“Is this a mapped page, and if it is, don’t cache it because we’re actually 
performing an fadvise().”  So, to pick an extreme example, if you did an 
fadvise(“libc.so”, DONTNEED), unless some process had libc.so wired, then every 
single mapping to every single page of libc.so was going to be destroyed and 
the pages moved to the cache.  However, because we moved the pages to the cache 
(rather than freeing them), and libc.so is frequently accessed, a subsequent 
instruction fetch would have faulted and been able to reactivate the cached 
page, avoiding an I/O operation.  In other words, that we were caching the 
pages targeted by fadvise() rather than simply freeing them mattered in cases 
where the pages were in use/accessed by multiple processes.


>>> I'm pretty sure I had mentioned this to Alan before.  I believe that the 
>>> idea is
>>> that pagedaemon should be cheap enough that having it run anyway shouldn't 
>>> be an
>>> issue, but I'm a bit skeptical of that. :)  Lock contention is always 
>>> possible and
>>> having DONTNEED/NOREUSE move 

Re: svn commit: r288431 - in head/sys: kern sys vm

2015-10-01 Thread John Baldwin
On Wednesday, September 30, 2015 11:06:30 PM Mark Johnston wrote:
> Author: markj
> Date: Wed Sep 30 23:06:29 2015
> New Revision: 288431
> URL: https://svnweb.freebsd.org/changeset/base/288431
> 
> Log:
>   As a step towards the elimination of PG_CACHED pages, rework the handling
>   of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to
>   the head of the inactive queue instead of being cached.
>   
>   This affects the implementation of POSIX_FADV_NOREUSE as well, since it
>   works by applying POSIX_FADV_DONTNEED to file ranges after they have been
>   read or written.  At that point the corresponding buffers may still be
>   dirty, so the previous implementation would coalesce successive ranges and
>   apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the
>   dirty buffers would eventually be cached.  To preserve this behaviour in an
>   efficient manner, this change adds a new buf flag, B_NOREUSE, which causes
>   the pages backing a VMIO buf to be placed at the head of the inactive queue
>   when the buf is released.  POSIX_FADV_NOREUSE then works by setting this
>   flag in bufs that underlie the specified range.

Putting these pages back on the inactive queue completely defeats the primary
purpose of DONTNEED and NOREUSE.  The primary purpose is to move the pages out
of the VM object's tree of pages and into the free pool so that the application
can instruct the VM to free memory more efficiently than relying on page daemon.

The implementation used cache pages instead of free as a cheap optimization so
that if an application did something dumb where it used DONTNEED and then turned
around and read the file it would not have to go to disk if the pages had not
yet been reused.  In practice this didn't work out so well because PG_CACHE 
pages
don't really work well.

However, using PG_CACHE was secondary to the primary purpose of explicitly 
freeing
memory that an application knew wasn't going to be reused and avoiding the need
for pagedaemon to run at all.  I think this should be freeing the pages instead 
of
keeping them inactive.  If an application uses DONTNEED or NOREUSE and then 
turns
around and rereads the file, it generally deserves to have to go to disk for it.

I'm pretty sure I had mentioned this to Alan before.  I believe that the idea is
that pagedaemon should be cheap enough that having it run anyway shouldn't be an
issue, but I'm a bit skeptical of that. :)  Lock contention is always possible 
and
having DONTNEED/NOREUSE move pages to PG_CACHE avoided lock contention with
pagedaemon during application page faults (since pagedaemon potentially never 
has
to run).

I believe that B_NOREUSE is definitely cleaner, btw.  I had wanted to change
NOREUSE to work that way but wasn't sure how to do it.

-- 
John Baldwin
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r288431 - in head/sys: kern sys vm

2015-10-01 Thread Mark Johnston
On Thu, Oct 01, 2015 at 09:32:45AM -0700, John Baldwin wrote:
> On Wednesday, September 30, 2015 11:06:30 PM Mark Johnston wrote:
> > Author: markj
> > Date: Wed Sep 30 23:06:29 2015
> > New Revision: 288431
> > URL: https://svnweb.freebsd.org/changeset/base/288431
> > 
> > Log:
> >   As a step towards the elimination of PG_CACHED pages, rework the handling
> >   of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to
> >   the head of the inactive queue instead of being cached.
> >   
> >   This affects the implementation of POSIX_FADV_NOREUSE as well, since it
> >   works by applying POSIX_FADV_DONTNEED to file ranges after they have been
> >   read or written.  At that point the corresponding buffers may still be
> >   dirty, so the previous implementation would coalesce successive ranges and
> >   apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the
> >   dirty buffers would eventually be cached.  To preserve this behaviour in 
> > an
> >   efficient manner, this change adds a new buf flag, B_NOREUSE, which causes
> >   the pages backing a VMIO buf to be placed at the head of the inactive 
> > queue
> >   when the buf is released.  POSIX_FADV_NOREUSE then works by setting this
> >   flag in bufs that underlie the specified range.
> 
> Putting these pages back on the inactive queue completely defeats the primary
> purpose of DONTNEED and NOREUSE.  The primary purpose is to move the pages out
> of the VM object's tree of pages and into the free pool so that the 
> application
> can instruct the VM to free memory more efficiently than relying on page 
> daemon.
> 
> The implementation used cache pages instead of free as a cheap optimization so
> that if an application did something dumb where it used DONTNEED and then 
> turned
> around and read the file it would not have to go to disk if the pages had not
> yet been reused.  In practice this didn't work out so well because PG_CACHE 
> pages
> don't really work well.
> 
> However, using PG_CACHE was secondary to the primary purpose of explicitly 
> freeing
> memory that an application knew wasn't going to be reused and avoiding the 
> need
> for pagedaemon to run at all.  I think this should be freeing the pages 
> instead of
> keeping them inactive.  If an application uses DONTNEED or NOREUSE and then 
> turns
> around and rereads the file, it generally deserves to have to go to disk for 
> it.

A problem with this is that one application's DONTNEED or NOREUSE hint
would cause every application reading or writing that file to go to
disk, but posix_fadvise(2) is explicitly intended for applications that
wish to provide hints about their own access patterns. I realize that
it's typically used with application-private files, but that's not a
requirement of the interface. Deactivating (or caching) the backing
pages generally avoids this problem.

> 
> I'm pretty sure I had mentioned this to Alan before.  I believe that the idea 
> is
> that pagedaemon should be cheap enough that having it run anyway shouldn't be 
> an
> issue, but I'm a bit skeptical of that. :)  Lock contention is always 
> possible and
> having DONTNEED/NOREUSE move pages to PG_CACHE avoided lock contention with
> pagedaemon during application page faults (since pagedaemon potentially never 
> has
> to run).

That's true, but the page queue locking (and the pagedaemon's
manipulation of the page queue locks) has also become more fine-grained
since posix_fadvise(2) was added. In particular, from some reading of
sys/vm in stable/8, inactive queue scans used to be performed with the
global page queue lock held; it was only dropped to launder dirty pages.
Now, the page queue lock is split into separate locks for the active and
inactive page queues, and the pagedaemon drops the inactive queue lock
for each page in all but a few exceptional cases. Does the optimization
of freeing or caching DONTNEED pages buy us all that much now?

Some synthetic testing in which an application writes out many large
(2G) files and calls posix_fadvise(FADV_DONTNEED) after each one shows
no significant difference in runtime if the buffer pages are deactivated
vs. freed. (My test just modifies vfs_vmio_unwire() to treat B_NOREUSE
identically to B_DIRECT.) Unsurprisingly, I see very little lock
contention in the latter case, but in the former, most of the lock
contention is short (i.e. the mutex is acquired while spinning), and
a large majority of the contention is on the free page queue mutex. If
lock contention there is a concern, wouldn't it be better to try and
address that directly rather than by bypassing the pagedaemon?

> 
> I believe that B_NOREUSE is definitely cleaner, btw.  I had wanted to change
> NOREUSE to work that way but wasn't sure how to do it.
> 
> -- 
> John Baldwin
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to 

svn commit: r288431 - in head/sys: kern sys vm

2015-09-30 Thread Mark Johnston
Author: markj
Date: Wed Sep 30 23:06:29 2015
New Revision: 288431
URL: https://svnweb.freebsd.org/changeset/base/288431

Log:
  As a step towards the elimination of PG_CACHED pages, rework the handling
  of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to
  the head of the inactive queue instead of being cached.
  
  This affects the implementation of POSIX_FADV_NOREUSE as well, since it
  works by applying POSIX_FADV_DONTNEED to file ranges after they have been
  read or written.  At that point the corresponding buffers may still be
  dirty, so the previous implementation would coalesce successive ranges and
  apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the
  dirty buffers would eventually be cached.  To preserve this behaviour in an
  efficient manner, this change adds a new buf flag, B_NOREUSE, which causes
  the pages backing a VMIO buf to be placed at the head of the inactive queue
  when the buf is released.  POSIX_FADV_NOREUSE then works by setting this
  flag in bufs that underlie the specified range.
  
  Reviewed by:  alc, kib
  Sponsored by: EMC / Isilon Storage Division
  Differential Revision:https://reviews.freebsd.org/D3726

Modified:
  head/sys/kern/vfs_bio.c
  head/sys/kern/vfs_default.c
  head/sys/kern/vfs_syscalls.c
  head/sys/kern/vfs_vnops.c
  head/sys/sys/buf.h
  head/sys/sys/file.h
  head/sys/vm/vm_object.c
  head/sys/vm/vm_object.h
  head/sys/vm/vm_page.c
  head/sys/vm/vm_page.h

Modified: head/sys/kern/vfs_bio.c
==
--- head/sys/kern/vfs_bio.c Wed Sep 30 21:32:29 2015(r288430)
+++ head/sys/kern/vfs_bio.c Wed Sep 30 23:06:29 2015(r288431)
@@ -1785,6 +1785,8 @@ brelse(struct buf *bp)
bp, bp->b_vp, bp->b_flags);
KASSERT(!(bp->b_flags & (B_CLUSTER|B_PAGING)),
("brelse: inappropriate B_PAGING or B_CLUSTER bp %p", bp));
+   KASSERT((bp->b_flags & B_VMIO) != 0 || (bp->b_flags & B_NOREUSE) == 0,
+   ("brelse: non-VMIO buffer marked NOREUSE"));
 
if (BUF_LOCKRECURSED(bp)) {
/*
@@ -1873,8 +1875,10 @@ brelse(struct buf *bp)
allocbuf(bp, 0);
}
 
-   if ((bp->b_flags & (B_INVAL | B_RELBUF)) != 0) {
+   if ((bp->b_flags & (B_INVAL | B_RELBUF)) != 0 ||
+   (bp->b_flags & (B_DELWRI | B_NOREUSE)) == B_NOREUSE) {
allocbuf(bp, 0);
+   bp->b_flags &= ~B_NOREUSE;
if (bp->b_vp != NULL)
brelvp(bp);
}
@@ -1969,6 +1973,10 @@ bqrelse(struct buf *bp)
if ((bp->b_flags & B_DELWRI) == 0 &&
(bp->b_xflags & BX_VNDIRTY))
panic("bqrelse: not dirty");
+   if ((bp->b_flags & B_NOREUSE) != 0) {
+   brelse(bp);
+   return;
+   }
qindex = QUEUE_CLEAN;
}
binsfree(bp, qindex);
@@ -2079,10 +2087,15 @@ vfs_vmio_unwire(struct buf *bp, vm_page_
freed = false;
if (!freed) {
/*
-* In order to maintain LRU page ordering, put
-* the page at the tail of the inactive queue.
+* If the page is unlikely to be reused, let the
+* VM know.  Otherwise, maintain LRU page
+* ordering and put the page at the tail of the
+* inactive queue.
 */
-   vm_page_deactivate(m);
+   if ((bp->b_flags & B_NOREUSE) != 0)
+   vm_page_deactivate_noreuse(m);
+   else
+   vm_page_deactivate(m);
}
}
vm_page_unlock(m);
@@ -2456,8 +2469,9 @@ getnewbuf_reuse_bp(struct buf *bp, int q
 * Note: we no longer distinguish between VMIO and non-VMIO
 * buffers.
 */
-   KASSERT((bp->b_flags & B_DELWRI) == 0,
-   ("delwri buffer %p found in queue %d", bp, qindex));
+   KASSERT((bp->b_flags & (B_DELWRI | B_NOREUSE)) == 0,
+   ("invalid buffer %p flags %#x found in queue %d", bp, bp->b_flags,
+   qindex));
 
/*
 * When recycling a clean buffer we have to truncate it and

Modified: head/sys/kern/vfs_default.c
==
--- head/sys/kern/vfs_default.c Wed Sep 30 21:32:29 2015(r288430)
+++ head/sys/kern/vfs_default.c Wed Sep 30 23:06:29 2015(r288431)
@@ -1034,9 +1034,12 @@ vop_stdallocate(struct vop_allocate_args
 int
 vop_stdadvise(struct vop_advise_args *ap)
 {
+   struct buf *bp;
+   struct buflists *bl;
struct vnode *vp;
+   daddr_t bn, startn, endn;
off_t start, end;
-   int error;
+   int bsize, error;
 
vp =