Re: quick fix for uvm deadlocks

2014-02-06 Thread Mark Kettenis
 Date: Wed, 05 Feb 2014 23:03:09 -0500
 From: Ted Unangst t...@tedunangst.com
 
 On Wed, Feb 05, 2014 at 17:53, Bob Beck wrote:
  On Wed, Feb 5, 2014 at 3:17 PM, Ted Unangst t...@tedunangst.com wrote:
  We are missing back pressure channels from uvm to the buf cache. The
  buf cache will happily sit on 9000 free pages while uvm churns around
  trying to scavenge up one more page.
 
  Or are you in a situation here where the cache has *not* backed off?
 
 Talked to Bob and hashed out better ideas of the problem. The page
 daemon does tell the buffer cache to make some room, but...
 
 If you have a huge mmap file, the pdaemon will try to flush it out via
 VOP_WRITE, which circles back via ffs into buf_get, which eats those
 previously freed pages, and then some, as the pagedaemon continues
 pushing more and more of the mmap file out.
 
 We discussed some other changes and fixes that this situation has
 clearly highlighted, but here's a slightly revised diff. It now uses
 the correct bufbackoff() function to communicate uvm's needs. Any
 other fix is rather precarious for this release, but as stated before,
 this keeps the change to the deadlock paths. You were already dead,
 but now you have a second chance.
 
 (We don't currently use the pmemrange argument; we'll have to adjust
 accordingly when the bufcache becomes range aware.)

The approach taken here makes sense to me.  And I don't really want to
hold up a fix for this.  But there are some things I'd like you to
consider before committing this.

I believe the scenario you sketched should only land you in
uvm_wait_pla(), but not in uvm_wait().  Perhaps with the current code
we can end up in uvm_wait(), but I think those would be bugs where the
driver I/O paths are doing memory allocations when they really
shouldn't.  Therefore, I'm not sure we should add the bufbackoff()
call in uvm_wait().  At the very least I'd like to have a printf
*before* the bufbackoff() call there.

I'd also like to see a comment on the bufbackoff() call in
uvm_wait_pla() explaining that even though we pushed back the buffer
cache in the page daemon, we might still consume the freed pages when
paging out dirty pages while scanning.


 Index: uvm_pdaemon.c
 ===
 RCS file: /cvs/src/sys/uvm/uvm_pdaemon.c,v
 retrieving revision 1.64
 diff -u -p -r1.64 uvm_pdaemon.c
 --- uvm_pdaemon.c 30 May 2013 16:29:46 -  1.64
 +++ uvm_pdaemon.c 6 Feb 2014 03:09:53 -
 @@ -117,6 +117,8 @@ uvm_wait(const char *wmsg)
*/
  
   if (curproc == uvm.pagedaemon_proc) {
 + if (bufbackoff(NULL, 4) == 0)
 + return;
   /*
* now we have a problem: the pagedaemon wants to go to
* sleep until it frees more memory.   but how can it
 Index: uvm_pmemrange.c
 ===
 RCS file: /cvs/src/sys/uvm/uvm_pmemrange.c,v
 retrieving revision 1.36
 diff -u -p -r1.36 uvm_pmemrange.c
 --- uvm_pmemrange.c   29 Jan 2013 19:55:48 -  1.36
 +++ uvm_pmemrange.c   6 Feb 2014 03:10:32 -
 @@ -22,6 +22,7 @@
  #include sys/malloc.h
  #include sys/proc.h/* XXX for atomic */
  #include sys/kernel.h
 +#include sys/mount.h
  
  /*
   * 2 trees: addr tree and size tree.
 @@ -1883,6 +1884,13 @@ uvm_wait_pla(paddr_t low, paddr_t high, 
   const char *wmsg = pmrwait;
  
   if (curproc == uvm.pagedaemon_proc) {
 + uvm_unlock_fpageq();
 + if (bufbackoff(NULL, atop(size)) == 0) {
 + uvm_lock_fpageq();
 + return 0;
 + }
 + uvm_lock_fpageq();
 +
   /*
* XXX detect pagedaemon deadlock - see comment in
* uvm_wait(), as this is exactly the same issue.
 
 



Re: quick fix for uvm deadlocks

2014-02-06 Thread Ted Unangst
On Thu, Feb 06, 2014 at 12:34, Mark Kettenis wrote:

 I believe the scenario you sketched should only land you in
 uvm_wait_pla(), but not in uvm_wait().  Perhaps with the current code
 we can end up in uvm_wait(), but I think those would be bugs where the
 driver I/O paths are doing memory allocations when they really
 shouldn't.  Therefore, I'm not sure we should add the bufbackoff()
 call in uvm_wait().  At the very least I'd like to have a printf
 *before* the bufbackoff() call there.

Sure. I didn't observe that path, but since the current fix of
printf and msleep is used both places, I added bufbackoff both places.

 I'd also like to see a comment on the bufbackoff() call in
 uvm_wait_pla() explaining that even though we pushed back the buffer
 cache in the page daemon, we might still consume the freed pages when
 paging out dirty pages while scanning.

Sure.



quick fix for uvm deadlocks

2014-02-05 Thread Ted Unangst
We are missing back pressure channels from uvm to the buf cache. The
buf cache will happily sit on 9000 free pages while uvm churns around
trying to scavenge up one more page.

Fixing this is beyond the scope of a simple diff, but here's something
that seems to help in a lot of the common cases, particularly the pla
deadlock detected spin cycle that people see.

If we're out of memory, kick the buf cache into releasing some
clean pages. The buf cache may eventually find itself sorely in need
of memory and unable to get it, but this is better than nothing. I've
deliberately saved the back pressure until we're already on the about to
die path to minimize regressions. uvm won't steal back memory unless
it absolutely has to.

Index: kern/vfs_bio.c
===
RCS file: /cvs/src/sys/kern/vfs_bio.c,v
retrieving revision 1.154
diff -u -p -r1.154 vfs_bio.c
--- kern/vfs_bio.c  25 Jan 2014 04:23:31 -  1.154
+++ kern/vfs_bio.c  5 Feb 2014 22:08:07 -
@@ -305,6 +305,26 @@ bufadjust(int newbufpages)
splx(s);
 }
 
+int
+buf_nukeclean(void)
+{
+   struct buf *bp;
+   int n;
+
+   printf(nuking clean bufs\n);
+   n = 0;
+   while ((bp = TAILQ_FIRST(bufqueues[BQ_CLEAN]))  n++  10) {
+   bremfree(bp);
+   if (bp-b_vp) {
+   RB_REMOVE(buf_rb_bufs,
+   bp-b_vp-v_bufs_tree, bp);
+   brelvp(bp);
+   }
+   buf_put(bp);
+   }
+   return (n);
+}
+
 /*
  * Make the buffer cache back off from cachepct.
  */
Index: sys/buf.h
===
RCS file: /cvs/src/sys/sys/buf.h,v
retrieving revision 1.93
diff -u -p -r1.93 buf.h
--- sys/buf.h   21 Nov 2013 01:16:52 -  1.93
+++ sys/buf.h   5 Feb 2014 22:04:09 -
@@ -312,6 +312,8 @@ voidbuf_fix_mapping(struct buf *, vsize
 void   buf_alloc_pages(struct buf *, vsize_t);
 void   buf_free_pages(struct buf *);
 
+intbuf_nukeclean(void);
+
 
 void   minphys(struct buf *bp);
 intphysio(void (*strategy)(struct buf *), dev_t dev, int flags,
Index: uvm/uvm_pdaemon.c
===
RCS file: /cvs/src/sys/uvm/uvm_pdaemon.c,v
retrieving revision 1.64
diff -u -p -r1.64 uvm_pdaemon.c
--- uvm/uvm_pdaemon.c   30 May 2013 16:29:46 -  1.64
+++ uvm/uvm_pdaemon.c   5 Feb 2014 22:04:15 -
@@ -116,7 +116,7 @@ uvm_wait(const char *wmsg)
 * check for page daemon going to sleep (waiting for itself)
 */
 
-   if (curproc == uvm.pagedaemon_proc) {
+   if (curproc == uvm.pagedaemon_proc  buf_nukeclean() == 0) {
/*
 * now we have a problem: the pagedaemon wants to go to
 * sleep until it frees more memory.   but how can it
Index: uvm/uvm_pmemrange.c
===
RCS file: /cvs/src/sys/uvm/uvm_pmemrange.c,v
retrieving revision 1.36
diff -u -p -r1.36 uvm_pmemrange.c
--- uvm/uvm_pmemrange.c 29 Jan 2013 19:55:48 -  1.36
+++ uvm/uvm_pmemrange.c 5 Feb 2014 22:07:37 -
@@ -22,6 +22,7 @@
 #include sys/malloc.h
 #include sys/proc.h  /* XXX for atomic */
 #include sys/kernel.h
+#include sys/buf.h
 
 /*
  * 2 trees: addr tree and size tree.
@@ -1883,6 +1884,13 @@ uvm_wait_pla(paddr_t low, paddr_t high, 
const char *wmsg = pmrwait;
 
if (curproc == uvm.pagedaemon_proc) {
+   uvm_unlock_fpageq();
+   if (buf_nukeclean() != 0) {
+   uvm_lock_fpageq();
+   return 0;
+   }
+   uvm_lock_fpageq();
+
/*
 * XXX detect pagedaemon deadlock - see comment in
 * uvm_wait(), as this is exactly the same issue.




Re: quick fix for uvm deadlocks

2014-02-05 Thread Bob Beck
On Wed, Feb 5, 2014 at 3:17 PM, Ted Unangst t...@tedunangst.com wrote:
 We are missing back pressure channels from uvm to the buf cache. The
 buf cache will happily sit on 9000 free pages while uvm churns around
 trying to scavenge up one more page.

Indeed, those are it's minimums (I presume in your case) and are
exactly the amount of memory
that uvm would never have even seen under the model of the static
cache. So I don't agree
with your statement we are missing back pressure channels from the
uvm to the buf cache.

It looks to me like the situation you are talking about is that the
buffer cache has already backed off
to it's minium (which is used to ensure things like avoiding deadlocks
in the bufer cache on delayed
writes and fun stuff like that).

Or are you in a situation here where the cache has *not* backed off?



 Fixing this is beyond the scope of a simple diff, but here's something
 that seems to help in a lot of the common cases, particularly the pla
 deadlock detected spin cycle that people see.

 If we're out of memory, kick the buf cache into releasing some
 clean pages. The buf cache may eventually find itself sorely in need
 of memory and unable to get it, but this is better than nothing. I've
 deliberately saved the back pressure until we're already on the about to
 die path to minimize regressions. uvm won't steal back memory unless
 it absolutely has to.

And for the reasons you say, I think this has great potential to move
the deadlock
into the buffer cache on small memory machines when we get the entire
cache filled with
delwri...

Sure, we can make the miniums smaller, but in the end we still have to
fix the page
daemon or we are just delaying the inevitable or moving the deadlock
to other subsystems.





 Index: kern/vfs_bio.c
 ===
 RCS file: /cvs/src/sys/kern/vfs_bio.c,v
 retrieving revision 1.154
 diff -u -p -r1.154 vfs_bio.c
 --- kern/vfs_bio.c  25 Jan 2014 04:23:31 -  1.154
 +++ kern/vfs_bio.c  5 Feb 2014 22:08:07 -
 @@ -305,6 +305,26 @@ bufadjust(int newbufpages)
 splx(s);
  }

 +int
 +buf_nukeclean(void)
 +{
 +   struct buf *bp;
 +   int n;
 +
 +   printf(nuking clean bufs\n);
 +   n = 0;
 +   while ((bp = TAILQ_FIRST(bufqueues[BQ_CLEAN]))  n++  10) {
 +   bremfree(bp);
 +   if (bp-b_vp) {
 +   RB_REMOVE(buf_rb_bufs,
 +   bp-b_vp-v_bufs_tree, bp);
 +   brelvp(bp);
 +   }
 +   buf_put(bp);
 +   }
 +   return (n);
 +}
 +
  /*
   * Make the buffer cache back off from cachepct.
   */
 Index: sys/buf.h
 ===
 RCS file: /cvs/src/sys/sys/buf.h,v
 retrieving revision 1.93
 diff -u -p -r1.93 buf.h
 --- sys/buf.h   21 Nov 2013 01:16:52 -  1.93
 +++ sys/buf.h   5 Feb 2014 22:04:09 -
 @@ -312,6 +312,8 @@ voidbuf_fix_mapping(struct buf *, vsize
  void   buf_alloc_pages(struct buf *, vsize_t);
  void   buf_free_pages(struct buf *);

 +intbuf_nukeclean(void);
 +

  void   minphys(struct buf *bp);
  intphysio(void (*strategy)(struct buf *), dev_t dev, int flags,
 Index: uvm/uvm_pdaemon.c
 ===
 RCS file: /cvs/src/sys/uvm/uvm_pdaemon.c,v
 retrieving revision 1.64
 diff -u -p -r1.64 uvm_pdaemon.c
 --- uvm/uvm_pdaemon.c   30 May 2013 16:29:46 -  1.64
 +++ uvm/uvm_pdaemon.c   5 Feb 2014 22:04:15 -
 @@ -116,7 +116,7 @@ uvm_wait(const char *wmsg)
  * check for page daemon going to sleep (waiting for itself)
  */

 -   if (curproc == uvm.pagedaemon_proc) {
 +   if (curproc == uvm.pagedaemon_proc  buf_nukeclean() == 0) {
 /*
  * now we have a problem: the pagedaemon wants to go to
  * sleep until it frees more memory.   but how can it
 Index: uvm/uvm_pmemrange.c
 ===
 RCS file: /cvs/src/sys/uvm/uvm_pmemrange.c,v
 retrieving revision 1.36
 diff -u -p -r1.36 uvm_pmemrange.c
 --- uvm/uvm_pmemrange.c 29 Jan 2013 19:55:48 -  1.36
 +++ uvm/uvm_pmemrange.c 5 Feb 2014 22:07:37 -
 @@ -22,6 +22,7 @@
  #include sys/malloc.h
  #include sys/proc.h  /* XXX for atomic */
  #include sys/kernel.h
 +#include sys/buf.h

  /*
   * 2 trees: addr tree and size tree.
 @@ -1883,6 +1884,13 @@ uvm_wait_pla(paddr_t low, paddr_t high,
 const char *wmsg = pmrwait;

 if (curproc == uvm.pagedaemon_proc) {
 +   uvm_unlock_fpageq();
 +   if (buf_nukeclean() != 0) {
 +   uvm_lock_fpageq();
 +   return 0;
 +   }
 +   uvm_lock_fpageq();
 +
 /*
  * XXX detect pagedaemon deadlock - see comment in
  * uvm_wait(), as this is exactly the same 

Re: quick fix for uvm deadlocks

2014-02-05 Thread Ted Unangst
On Wed, Feb 05, 2014 at 17:53, Bob Beck wrote:
 On Wed, Feb 5, 2014 at 3:17 PM, Ted Unangst t...@tedunangst.com wrote:
 We are missing back pressure channels from uvm to the buf cache. The
 buf cache will happily sit on 9000 free pages while uvm churns around
 trying to scavenge up one more page.

 Or are you in a situation here where the cache has *not* backed off?

Talked to Bob and hashed out better ideas of the problem. The page
daemon does tell the buffer cache to make some room, but...

If you have a huge mmap file, the pdaemon will try to flush it out via
VOP_WRITE, which circles back via ffs into buf_get, which eats those
previously freed pages, and then some, as the pagedaemon continues
pushing more and more of the mmap file out.

We discussed some other changes and fixes that this situation has
clearly highlighted, but here's a slightly revised diff. It now uses
the correct bufbackoff() function to communicate uvm's needs. Any
other fix is rather precarious for this release, but as stated before,
this keeps the change to the deadlock paths. You were already dead,
but now you have a second chance.

(We don't currently use the pmemrange argument; we'll have to adjust
accordingly when the bufcache becomes range aware.)

Index: uvm_pdaemon.c
===
RCS file: /cvs/src/sys/uvm/uvm_pdaemon.c,v
retrieving revision 1.64
diff -u -p -r1.64 uvm_pdaemon.c
--- uvm_pdaemon.c   30 May 2013 16:29:46 -  1.64
+++ uvm_pdaemon.c   6 Feb 2014 03:09:53 -
@@ -117,6 +117,8 @@ uvm_wait(const char *wmsg)
 */
 
if (curproc == uvm.pagedaemon_proc) {
+   if (bufbackoff(NULL, 4) == 0)
+   return;
/*
 * now we have a problem: the pagedaemon wants to go to
 * sleep until it frees more memory.   but how can it
Index: uvm_pmemrange.c
===
RCS file: /cvs/src/sys/uvm/uvm_pmemrange.c,v
retrieving revision 1.36
diff -u -p -r1.36 uvm_pmemrange.c
--- uvm_pmemrange.c 29 Jan 2013 19:55:48 -  1.36
+++ uvm_pmemrange.c 6 Feb 2014 03:10:32 -
@@ -22,6 +22,7 @@
 #include sys/malloc.h
 #include sys/proc.h  /* XXX for atomic */
 #include sys/kernel.h
+#include sys/mount.h
 
 /*
  * 2 trees: addr tree and size tree.
@@ -1883,6 +1884,13 @@ uvm_wait_pla(paddr_t low, paddr_t high, 
const char *wmsg = pmrwait;
 
if (curproc == uvm.pagedaemon_proc) {
+   uvm_unlock_fpageq();
+   if (bufbackoff(NULL, atop(size)) == 0) {
+   uvm_lock_fpageq();
+   return 0;
+   }
+   uvm_lock_fpageq();
+
/*
 * XXX detect pagedaemon deadlock - see comment in
 * uvm_wait(), as this is exactly the same issue.



Re: quick fix for uvm deadlocks

2014-02-05 Thread Bob Beck
Yes, this is much better.  although I think this problem related to
big mmaps has been with us for a while.

and appears to avoid the problem with the offending test programs on
my machines.

I'm ok with that going in for the moment, although I want some of your
time to look at that nasty shit we talked about :)  I think
we can make that a lot better with some NOCACHE..


On Wed, Feb 5, 2014 at 9:03 PM, Ted Unangst t...@tedunangst.com wrote:
 On Wed, Feb 05, 2014 at 17:53, Bob Beck wrote:
 On Wed, Feb 5, 2014 at 3:17 PM, Ted Unangst t...@tedunangst.com wrote:
 We are missing back pressure channels from uvm to the buf cache. The
 buf cache will happily sit on 9000 free pages while uvm churns around
 trying to scavenge up one more page.

 Or are you in a situation here where the cache has *not* backed off?

 Talked to Bob and hashed out better ideas of the problem. The page
 daemon does tell the buffer cache to make some room, but...

 If you have a huge mmap file, the pdaemon will try to flush it out via
 VOP_WRITE, which circles back via ffs into buf_get, which eats those
 previously freed pages, and then some, as the pagedaemon continues
 pushing more and more of the mmap file out.

 We discussed some other changes and fixes that this situation has
 clearly highlighted, but here's a slightly revised diff. It now uses
 the correct bufbackoff() function to communicate uvm's needs. Any
 other fix is rather precarious for this release, but as stated before,
 this keeps the change to the deadlock paths. You were already dead,
 but now you have a second chance.

 (We don't currently use the pmemrange argument; we'll have to adjust
 accordingly when the bufcache becomes range aware.)

 Index: uvm_pdaemon.c
 ===
 RCS file: /cvs/src/sys/uvm/uvm_pdaemon.c,v
 retrieving revision 1.64
 diff -u -p -r1.64 uvm_pdaemon.c
 --- uvm_pdaemon.c   30 May 2013 16:29:46 -  1.64
 +++ uvm_pdaemon.c   6 Feb 2014 03:09:53 -
 @@ -117,6 +117,8 @@ uvm_wait(const char *wmsg)
  */

 if (curproc == uvm.pagedaemon_proc) {
 +   if (bufbackoff(NULL, 4) == 0)
 +   return;
 /*
  * now we have a problem: the pagedaemon wants to go to
  * sleep until it frees more memory.   but how can it
 Index: uvm_pmemrange.c
 ===
 RCS file: /cvs/src/sys/uvm/uvm_pmemrange.c,v
 retrieving revision 1.36
 diff -u -p -r1.36 uvm_pmemrange.c
 --- uvm_pmemrange.c 29 Jan 2013 19:55:48 -  1.36
 +++ uvm_pmemrange.c 6 Feb 2014 03:10:32 -
 @@ -22,6 +22,7 @@
  #include sys/malloc.h
  #include sys/proc.h  /* XXX for atomic */
  #include sys/kernel.h
 +#include sys/mount.h

  /*
   * 2 trees: addr tree and size tree.
 @@ -1883,6 +1884,13 @@ uvm_wait_pla(paddr_t low, paddr_t high,
 const char *wmsg = pmrwait;

 if (curproc == uvm.pagedaemon_proc) {
 +   uvm_unlock_fpageq();
 +   if (bufbackoff(NULL, atop(size)) == 0) {
 +   uvm_lock_fpageq();
 +   return 0;
 +   }
 +   uvm_lock_fpageq();
 +
 /*
  * XXX detect pagedaemon deadlock - see comment in
  * uvm_wait(), as this is exactly the same issue.