Re: Memory reserves or lack thereof

2012-11-15 Thread Alan Cox
On 11/13/2012 05:54, Konstantin Belousov wrote:
 On Mon, Nov 12, 2012 at 05:10:01PM -0600, Alan Cox wrote:
 On 11/12/2012 3:48 PM, Konstantin Belousov wrote:
 On Mon, Nov 12, 2012 at 01:28:02PM -0800, Sushanth Rai wrote:
 This patch still doesn't address the issue of M_NOWAIT calls driving
 the memory the all the way down to 2 pages, right ? It would be nice to
 have M_NOWAIT just do non-sleep version of M_WAITOK and M_USE_RESERVE
 flag to dig deep.
 This is out of scope of the change. But it is required for any further
 adjustements.
 I would suggest a somewhat different response:

 The patch does make M_NOWAIT into a non-sleep version of M_WAITOK and 
 does reintroduce M_USE_RESERVE as a way to specify dig deep.

 Currently, both M_NOWAIT and M_WAITOK can drive the cache/free memory 
 down to two pages.  The effect of the patch is to stop M_NOWAIT at two 
 pages rather than allowing it to continue to zero pages.

 When you say, This is out of scope ..., I believe that you are 
 referring to changing two pages into something larger.  I agree that 
 this is out of scope for the current change.
 I referred exactly to the difference between M_USE_RESERVE set or not.
 IMO this is what was asked by the question author. So yes, my mean of
 the 'out of scope' is about tweaking the 'two pages reserve' in some
 way.

Since M_USE_RESERVE is no longer deprecated in HEAD, here is my proposed
man page update to malloc(9):

Index: share/man/man9/malloc.9
===
--- share/man/man9/malloc.9 (revision 243091)
+++ share/man/man9/malloc.9 (working copy)
@@ -29,7 +29,7 @@
 .\ $NetBSD: malloc.9,v 1.3 1996/11/11 00:05:11 lukem Exp $
 .\ $FreeBSD$
 .\
-.Dd January 28, 2012
+.Dd November 15, 2012
 .Dt MALLOC 9
 .Os
 .Sh NAME
@@ -153,13 +153,12 @@ if
 .Dv M_WAITOK
 is specified.
 .It Dv M_USE_RESERVE
-Indicates that the system can dig into its reserve in order to obtain the
-requested memory.
-This option used to be called
-.Dv M_KERNEL
-but has been renamed to something more obvious.
-This option has been deprecated and is slowly being removed from the
kernel,
-and so should not be used with any new programming.
+Indicates that the system can use its reserve of memory to satisfy the
+request.
+This option should only be used in combination with
+.Dv M_NOWAIT
+when an allocation failure cannot be tolerated by the caller without
+catastrophic effects on the system.
 .El
 .Pp
 Exactly one of either

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-15 Thread Konstantin Belousov
On Thu, Nov 15, 2012 at 11:32:18AM -0600, Alan Cox wrote:
 On 11/13/2012 05:54, Konstantin Belousov wrote:
  On Mon, Nov 12, 2012 at 05:10:01PM -0600, Alan Cox wrote:
  On 11/12/2012 3:48 PM, Konstantin Belousov wrote:
  On Mon, Nov 12, 2012 at 01:28:02PM -0800, Sushanth Rai wrote:
  This patch still doesn't address the issue of M_NOWAIT calls driving
  the memory the all the way down to 2 pages, right ? It would be nice to
  have M_NOWAIT just do non-sleep version of M_WAITOK and M_USE_RESERVE
  flag to dig deep.
  This is out of scope of the change. But it is required for any further
  adjustements.
  I would suggest a somewhat different response:
 
  The patch does make M_NOWAIT into a non-sleep version of M_WAITOK and 
  does reintroduce M_USE_RESERVE as a way to specify dig deep.
 
  Currently, both M_NOWAIT and M_WAITOK can drive the cache/free memory 
  down to two pages.  The effect of the patch is to stop M_NOWAIT at two 
  pages rather than allowing it to continue to zero pages.
 
  When you say, This is out of scope ..., I believe that you are 
  referring to changing two pages into something larger.  I agree that 
  this is out of scope for the current change.
  I referred exactly to the difference between M_USE_RESERVE set or not.
  IMO this is what was asked by the question author. So yes, my mean of
  the 'out of scope' is about tweaking the 'two pages reserve' in some
  way.
 
 Since M_USE_RESERVE is no longer deprecated in HEAD, here is my proposed
 man page update to malloc(9):
 
 Index: share/man/man9/malloc.9
 ===
 --- share/man/man9/malloc.9 (revision 243091)
 +++ share/man/man9/malloc.9 (working copy)
 @@ -29,7 +29,7 @@
  .\ $NetBSD: malloc.9,v 1.3 1996/11/11 00:05:11 lukem Exp $
  .\ $FreeBSD$
  .\
 -.Dd January 28, 2012
 +.Dd November 15, 2012
  .Dt MALLOC 9
  .Os
  .Sh NAME
 @@ -153,13 +153,12 @@ if
  .Dv M_WAITOK
  is specified.
  .It Dv M_USE_RESERVE
 -Indicates that the system can dig into its reserve in order to obtain the
 -requested memory.
 -This option used to be called
 -.Dv M_KERNEL
 -but has been renamed to something more obvious.
 -This option has been deprecated and is slowly being removed from the
 kernel,
 -and so should not be used with any new programming.
 +Indicates that the system can use its reserve of memory to satisfy the
 +request.
 +This option should only be used in combination with
 +.Dv M_NOWAIT
 +when an allocation failure cannot be tolerated by the caller without
 +catastrophic effects on the system.
  .El
  .Pp
  Exactly one of either

The text looks fine. Shouldn't the requirement for M_USE_RESERVE be also
expressed in KASSERT, like this:

diff --git a/sys/vm/vm_page.h b/sys/vm/vm_page.h
index d9e4692..f8a4f70 100644
--- a/sys/vm/vm_page.h
+++ b/sys/vm/vm_page.h
@@ -353,6 +351,9 @@ malloc2vm_flags(int malloc_flags)
 {
int pflags;
 
+   KASSERT((malloc_flags  M_USE_RESERVE) == 0 ||
+   (malloc_flags  M_NOWAIT) != 0,
+   (M_USE_RESERVE requires M_NOWAIT));
pflags = (malloc_flags  M_USE_RESERVE) != 0 ? VM_ALLOC_INTERRUPT :
VM_ALLOC_SYSTEM;
if ((malloc_flags  M_ZERO) != 0)

I understand that this could be added to places of the allocator's entries,
but I think that the page allocations are fine too.


pgptBhkylD1fK.pgp
Description: PGP signature


Re: Memory reserves or lack thereof

2012-11-15 Thread Alan Cox
On 11/15/2012 12:21, Konstantin Belousov wrote:
 On Thu, Nov 15, 2012 at 11:32:18AM -0600, Alan Cox wrote:
 On 11/13/2012 05:54, Konstantin Belousov wrote:
 On Mon, Nov 12, 2012 at 05:10:01PM -0600, Alan Cox wrote:
 On 11/12/2012 3:48 PM, Konstantin Belousov wrote:
 On Mon, Nov 12, 2012 at 01:28:02PM -0800, Sushanth Rai wrote:
 This patch still doesn't address the issue of M_NOWAIT calls driving
 the memory the all the way down to 2 pages, right ? It would be nice to
 have M_NOWAIT just do non-sleep version of M_WAITOK and M_USE_RESERVE
 flag to dig deep.
 This is out of scope of the change. But it is required for any further
 adjustements.
 I would suggest a somewhat different response:

 The patch does make M_NOWAIT into a non-sleep version of M_WAITOK and 
 does reintroduce M_USE_RESERVE as a way to specify dig deep.

 Currently, both M_NOWAIT and M_WAITOK can drive the cache/free memory 
 down to two pages.  The effect of the patch is to stop M_NOWAIT at two 
 pages rather than allowing it to continue to zero pages.

 When you say, This is out of scope ..., I believe that you are 
 referring to changing two pages into something larger.  I agree that 
 this is out of scope for the current change.
 I referred exactly to the difference between M_USE_RESERVE set or not.
 IMO this is what was asked by the question author. So yes, my mean of
 the 'out of scope' is about tweaking the 'two pages reserve' in some
 way.
 Since M_USE_RESERVE is no longer deprecated in HEAD, here is my proposed
 man page update to malloc(9):

 Index: share/man/man9/malloc.9
 ===
 --- share/man/man9/malloc.9 (revision 243091)
 +++ share/man/man9/malloc.9 (working copy)
 @@ -29,7 +29,7 @@
  .\ $NetBSD: malloc.9,v 1.3 1996/11/11 00:05:11 lukem Exp $
  .\ $FreeBSD$
  .\
 -.Dd January 28, 2012
 +.Dd November 15, 2012
  .Dt MALLOC 9
  .Os
  .Sh NAME
 @@ -153,13 +153,12 @@ if
  .Dv M_WAITOK
  is specified.
  .It Dv M_USE_RESERVE
 -Indicates that the system can dig into its reserve in order to obtain the
 -requested memory.
 -This option used to be called
 -.Dv M_KERNEL
 -but has been renamed to something more obvious.
 -This option has been deprecated and is slowly being removed from the
 kernel,
 -and so should not be used with any new programming.
 +Indicates that the system can use its reserve of memory to satisfy the
 +request.
 +This option should only be used in combination with
 +.Dv M_NOWAIT
 +when an allocation failure cannot be tolerated by the caller without
 +catastrophic effects on the system.
  .El
  .Pp
  Exactly one of either
 The text looks fine. Shouldn't the requirement for M_USE_RESERVE be also
 expressed in KASSERT, like this:

 diff --git a/sys/vm/vm_page.h b/sys/vm/vm_page.h
 index d9e4692..f8a4f70 100644
 --- a/sys/vm/vm_page.h
 +++ b/sys/vm/vm_page.h
 @@ -353,6 +351,9 @@ malloc2vm_flags(int malloc_flags)
  {
   int pflags;
  
 + KASSERT((malloc_flags  M_USE_RESERVE) == 0 ||
 + (malloc_flags  M_NOWAIT) != 0,
 + (M_USE_RESERVE requires M_NOWAIT));
   pflags = (malloc_flags  M_USE_RESERVE) != 0 ? VM_ALLOC_INTERRUPT :
   VM_ALLOC_SYSTEM;
   if ((malloc_flags  M_ZERO) != 0)

 I understand that this could be added to places of the allocator's entries,
 but I think that the page allocations are fine too.

Yes, please do that.

Alan

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-13 Thread Konstantin Belousov
On Mon, Nov 12, 2012 at 05:10:01PM -0600, Alan Cox wrote:
 On 11/12/2012 3:48 PM, Konstantin Belousov wrote:
  On Mon, Nov 12, 2012 at 01:28:02PM -0800, Sushanth Rai wrote:
  This patch still doesn't address the issue of M_NOWAIT calls driving
  the memory the all the way down to 2 pages, right ? It would be nice to
  have M_NOWAIT just do non-sleep version of M_WAITOK and M_USE_RESERVE
  flag to dig deep.
  This is out of scope of the change. But it is required for any further
  adjustements.
 
 I would suggest a somewhat different response:
 
 The patch does make M_NOWAIT into a non-sleep version of M_WAITOK and 
 does reintroduce M_USE_RESERVE as a way to specify dig deep.
 
 Currently, both M_NOWAIT and M_WAITOK can drive the cache/free memory 
 down to two pages.  The effect of the patch is to stop M_NOWAIT at two 
 pages rather than allowing it to continue to zero pages.
 
 When you say, This is out of scope ..., I believe that you are 
 referring to changing two pages into something larger.  I agree that 
 this is out of scope for the current change.

I referred exactly to the difference between M_USE_RESERVE set or not.
IMO this is what was asked by the question author. So yes, my mean of
the 'out of scope' is about tweaking the 'two pages reserve' in some
way.


pgpAl2UTJQyEa.pgp
Description: PGP signature


Re: Memory reserves or lack thereof

2012-11-13 Thread Alan Cox
On 11/12/2012 11:35, Alan Cox wrote:
 On 11/12/2012 07:36, Konstantin Belousov wrote:
 On Sun, Nov 11, 2012 at 03:40:24PM -0600, Alan Cox wrote:
 On Sat, Nov 10, 2012 at 7:20 AM, Konstantin Belousov 
 kostik...@gmail.comwrote:

 On Fri, Nov 09, 2012 at 07:10:04PM +, Sears, Steven wrote:
 I have a memory subsystem design question that I'm hoping someone can
 answer.
 I've been looking at a machine that is completely out of memory, as in

  v_free_count = 0,
  v_cache_count = 0,

 I wondered how a machine could completely run out of memory like this,
 especially after finding a lack of interrupt storms or other pathologies
 that would tend to overcommit memory. So I started investigating.
 Most allocators come down to vm_page_alloc(), which has this guard:

   if ((curproc == pageproc)  (page_req != VM_ALLOC_INTERRUPT)) {
   page_req = VM_ALLOC_SYSTEM;
   };

   if (cnt.v_free_count + cnt.v_cache_count  cnt.v_free_reserved ||
   (page_req == VM_ALLOC_SYSTEM 
   cnt.v_free_count + cnt.v_cache_count 
 cnt.v_interrupt_free_min) ||
   (page_req == VM_ALLOC_INTERRUPT 
   cnt.v_free_count + cnt.v_cache_count  0)) {

 The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate
 every last page.
 From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare,
 perhaps only used from interrupt threads. Not so, see kmem_malloc() or
 uma_small_alloc() which both contain this mapping:
   if ((flags  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
   pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
   else
   pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;

 Note that M_USE_RESERVE has been deprecated and is used in just a
 handful of places. Also note that lots of code paths come through these
 routines.
 What this means is essentially _any_ allocation using M_NOWAIT will
 bypass whatever reserves have been held back and will take every last page
 available.
 There is no documentation stating M_NOWAIT has this side effect of
 essentially being privileged, so any innocuous piece of code that can't
 block will use it. And of course M_NOWAIT is literally used all over.
 It looks to me like the design goal of the BSD allocators is on
 recovery; it will give all pages away knowing it can recover.
 Am I missing anything? I would have expected some small number of pages
 to be held in reserve just in case. And I didn't expect M_NOWAIT to be a
 sort of back door for grabbing memory.
 Your analysis is right, there is nothing to add or correct.
 This is the reason to strongly prefer M_WAITOK.

 Agreed.  Once upon time, before SMPng, M_NOWAIT was rarely used.  It was
 well understand that it should only be used by interrupt handlers.

 The trouble is that M_NOWAIT conflates two orthogonal things.  The obvious
 being that the allocation shouldn't sleep.  The other being how far we're
 willing to deplete the cache/free page queues.

 When fine-grained locking got sprinkled throughout the kernel, we all to
 often found ourselves wanting to do allocations without the possibility of
 blocking.  So, M_NOWAIT became commonplace, where it wasn't before.

 This had the unintended consequence of introducing a lot of memory
 allocations in the top-half of the kernel, i.e., non-interrupt handling
 code, that were digging deep into the cache/free page queues.

 Also, ironically, in today's kernel an M_NOWAIT | M_USE_RESERVE
 allocation is less likely to succeed than an M_NOWAIT allocation.
 However, prior to FreeBSD 7.x, M_NOWAIT couldn't allocate a cached page; it
 could only allocate a free page.  M_USE_RESERVE said that it ok to allocate
 a cached page even though M_NOWAIT was specified.  Consequently, the system
 wouldn't dig as far into the free page queue if M_USE_RESERVE was
 specified, because it was allowed to reclaim a cached page.

 In conclusion, I think it's time that we change M_NOWAIT so that it doesn't
 dig any deeper into the cache/free page queues than M_WAITOK does and
 reintroduce a M_USE_RESERVE-like flag that says dig deep into the
 cache/free page queues.  The trouble is that we then need to identify all
 of those places that are implicitly depending on the current behavior of
 M_NOWAIT also digging deep into the cache/free page queues so that we can
 add an explicit M_USE_RESERVE.

 Alan

 P.S. I suspect that we should also increase the size of the page reserve
 that is kept for VM_ALLOC_INTERRUPT allocations in vm_page_alloc*().  How
 many legitimate users of a new M_USE_RESERVE-like flag in today's kernel
 could actually be satisfied by two pages?
 I am almost sure that most of people who put the M_NOWAIT flag, do not
 know the 'allow the deeper drain of free queue' effect. As such, I believe
 we should flip the meaning of M_NOWAIT/M_USE_RESERVE. My only expectations
 of the problematic places would be in the swapout path.

 I found a single explicit use of M_USE_RESERVE in the kernel,
 so the flip is relatively simple.
 

Re: Memory reserves or lack thereof

2012-11-13 Thread Adrian Chadd
Hey, great catch!



adrian

On 13 November 2012 12:04, Alan Cox a...@rice.edu wrote:
 On 11/12/2012 11:35, Alan Cox wrote:
 On 11/12/2012 07:36, Konstantin Belousov wrote:
 On Sun, Nov 11, 2012 at 03:40:24PM -0600, Alan Cox wrote:
 On Sat, Nov 10, 2012 at 7:20 AM, Konstantin Belousov 
 kostik...@gmail.comwrote:

 On Fri, Nov 09, 2012 at 07:10:04PM +, Sears, Steven wrote:
 I have a memory subsystem design question that I'm hoping someone can
 answer.
 I've been looking at a machine that is completely out of memory, as in

  v_free_count = 0,
  v_cache_count = 0,

 I wondered how a machine could completely run out of memory like this,
 especially after finding a lack of interrupt storms or other pathologies
 that would tend to overcommit memory. So I started investigating.
 Most allocators come down to vm_page_alloc(), which has this guard:

   if ((curproc == pageproc)  (page_req != VM_ALLOC_INTERRUPT)) {
   page_req = VM_ALLOC_SYSTEM;
   };

   if (cnt.v_free_count + cnt.v_cache_count  cnt.v_free_reserved ||
   (page_req == VM_ALLOC_SYSTEM 
   cnt.v_free_count + cnt.v_cache_count 
 cnt.v_interrupt_free_min) ||
   (page_req == VM_ALLOC_INTERRUPT 
   cnt.v_free_count + cnt.v_cache_count  0)) {

 The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate
 every last page.
 From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare,
 perhaps only used from interrupt threads. Not so, see kmem_malloc() or
 uma_small_alloc() which both contain this mapping:
   if ((flags  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
   pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
   else
   pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;

 Note that M_USE_RESERVE has been deprecated and is used in just a
 handful of places. Also note that lots of code paths come through these
 routines.
 What this means is essentially _any_ allocation using M_NOWAIT will
 bypass whatever reserves have been held back and will take every last page
 available.
 There is no documentation stating M_NOWAIT has this side effect of
 essentially being privileged, so any innocuous piece of code that can't
 block will use it. And of course M_NOWAIT is literally used all over.
 It looks to me like the design goal of the BSD allocators is on
 recovery; it will give all pages away knowing it can recover.
 Am I missing anything? I would have expected some small number of pages
 to be held in reserve just in case. And I didn't expect M_NOWAIT to be a
 sort of back door for grabbing memory.
 Your analysis is right, there is nothing to add or correct.
 This is the reason to strongly prefer M_WAITOK.

 Agreed.  Once upon time, before SMPng, M_NOWAIT was rarely used.  It was
 well understand that it should only be used by interrupt handlers.

 The trouble is that M_NOWAIT conflates two orthogonal things.  The obvious
 being that the allocation shouldn't sleep.  The other being how far we're
 willing to deplete the cache/free page queues.

 When fine-grained locking got sprinkled throughout the kernel, we all to
 often found ourselves wanting to do allocations without the possibility of
 blocking.  So, M_NOWAIT became commonplace, where it wasn't before.

 This had the unintended consequence of introducing a lot of memory
 allocations in the top-half of the kernel, i.e., non-interrupt handling
 code, that were digging deep into the cache/free page queues.

 Also, ironically, in today's kernel an M_NOWAIT | M_USE_RESERVE
 allocation is less likely to succeed than an M_NOWAIT allocation.
 However, prior to FreeBSD 7.x, M_NOWAIT couldn't allocate a cached page; it
 could only allocate a free page.  M_USE_RESERVE said that it ok to allocate
 a cached page even though M_NOWAIT was specified.  Consequently, the system
 wouldn't dig as far into the free page queue if M_USE_RESERVE was
 specified, because it was allowed to reclaim a cached page.

 In conclusion, I think it's time that we change M_NOWAIT so that it doesn't
 dig any deeper into the cache/free page queues than M_WAITOK does and
 reintroduce a M_USE_RESERVE-like flag that says dig deep into the
 cache/free page queues.  The trouble is that we then need to identify all
 of those places that are implicitly depending on the current behavior of
 M_NOWAIT also digging deep into the cache/free page queues so that we can
 add an explicit M_USE_RESERVE.

 Alan

 P.S. I suspect that we should also increase the size of the page reserve
 that is kept for VM_ALLOC_INTERRUPT allocations in vm_page_alloc*().  How
 many legitimate users of a new M_USE_RESERVE-like flag in today's kernel
 could actually be satisfied by two pages?
 I am almost sure that most of people who put the M_NOWAIT flag, do not
 know the 'allow the deeper drain of free queue' effect. As such, I believe
 we should flip the meaning of M_NOWAIT/M_USE_RESERVE. My only expectations
 of the problematic places would be in the swapout path.

 I found a single 

Re: Memory reserves or lack thereof

2012-11-12 Thread Adrian Chadd
On 11 November 2012 20:24, Alfred Perlstein bri...@mu.org wrote:
 I think very few of the m_nowaits actually need the reserve behavior. We 
 should probably switch away from it digging that deep by default and 
 introduce a flag and/or a per thread flag to set the behavior.

There's already a perfectly fine flag - M_WAITOK. Just don't hold any
locks, right? :)


Adrian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-12 Thread Andre Oppermann

On 11.11.2012 22:40, Alan Cox wrote:

On Sat, Nov 10, 2012 at 7:20 AM, Konstantin Belousov kostik...@gmail.comwrote:

Your analysis is right, there is nothing to add or correct.
This is the reason to strongly prefer M_WAITOK.



Agreed.  Once upon time, before SMPng, M_NOWAIT was rarely used.  It was
well understand that it should only be used by interrupt handlers.

The trouble is that M_NOWAIT conflates two orthogonal things.  The obvious
being that the allocation shouldn't sleep.  The other being how far we're
willing to deplete the cache/free page queues.

When fine-grained locking got sprinkled throughout the kernel, we all to
often found ourselves wanting to do allocations without the possibility of
blocking.  So, M_NOWAIT became commonplace, where it wasn't before.


Yes, we have many places where we don't want to sleep for example in
the network code.  There we simply want to be told that we've run out
of memory and handle the failure.  It's expected to happen from time
to time.  We don't need or want to dig deep or into reserves.  Packets
are expected to get lost from time to time and upper layer protocols
will handle retransmits just fine.  What we *don't* want normally is to
get blocked on a failing memory allocation.  We'd rather drop this one
and go on with the next packet to avoid the head of line blocking
problem where everything cascades to a total halt.

As a side note we don't do many, if any, true interrupt time allocations
anymore.  Usually the interrupt is just acknowledged in interrupt
context and a taskqueue or ithread is scheduled to do all the hard work.
Neither runs in interrupt context.


This had the unintended consequence of introducing a lot of memory
allocations in the top-half of the kernel, i.e., non-interrupt handling
code, that were digging deep into the cache/free page queues.

Also, ironically, in today's kernel an M_NOWAIT | M_USE_RESERVE
allocation is less likely to succeed than an M_NOWAIT allocation.
However, prior to FreeBSD 7.x, M_NOWAIT couldn't allocate a cached page; it
could only allocate a free page.  M_USE_RESERVE said that it ok to allocate
a cached page even though M_NOWAIT was specified.  Consequently, the system
wouldn't dig as far into the free page queue if M_USE_RESERVE was
specified, because it was allowed to reclaim a cached page.

In conclusion, I think it's time that we change M_NOWAIT so that it doesn't
dig any deeper into the cache/free page queues than M_WAITOK does and
reintroduce a M_USE_RESERVE-like flag that says dig deep into the
cache/free page queues.  The trouble is that we then need to identify all
of those places that are implicitly depending on the current behavior of
M_NOWAIT also digging deep into the cache/free page queues so that we can
add an explicit M_USE_RESERVE.


I don't think many places depend on M_NOWAIT digging deep.  I'm
perfectly happy with having M_NOWAIT give up on first try.  Only
together with M_TRY_REALLY_HARD it would dig into reserves.

PS: We have a really nasty namespace collision with the mbuf flags
which use the M_* prefix as well.

--
Andre

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-12 Thread Andre Oppermann

On 12.11.2012 03:02, Adrian Chadd wrote:

On 11 November 2012 13:40, Alan Cox alan.l@gmail.com wrote:



Agreed.  Once upon time, before SMPng, M_NOWAIT was rarely used.  It was
well understand that it should only be used by interrupt handlers.

The trouble is that M_NOWAIT conflates two orthogonal things.  The obvious
being that the allocation shouldn't sleep.  The other being how far we're
willing to deplete the cache/free page queues.

When fine-grained locking got sprinkled throughout the kernel, we all to
often found ourselves wanting to do allocations without the possibility of
blocking.  So, M_NOWAIT became commonplace, where it wasn't before.


Well, what's the current set of best practices for allocating mbufs?


If an allocation is driven by user space then you can use M_WAITOK.

If an allocation is driven by the driver or kernel (callout and so on)
you do M_NOWAIT and handle a failure by trying again later either
directly by rescheduling the callout or by the upper layer retransmit
logic.

On top of that individual mbuf allocation or stitching mbufs and
clusters together manually is deprecated.  If every possible you
should use m_getm2().


I don't mind going through ath(4) and net80211(4), looking to make it
behave better with mbuf allocations. There's 49 M_NOWAIT's in net80211
and 10 in ath(4). I wonder how many of them are synonyms with don't
fail allocating, too. Hm.


Mbuf allocations are normally allowed to fail without serious
after effects other than retransmits and some overall recovery
pain.

Only non-mbuf memory allocations for important structures or
state that can't be recreated on retransmit should dig into
reserves.  Normally this is a very rare case in network related
code.

--
Andre

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-12 Thread Konstantin Belousov
On Sun, Nov 11, 2012 at 03:40:24PM -0600, Alan Cox wrote:
 On Sat, Nov 10, 2012 at 7:20 AM, Konstantin Belousov 
 kostik...@gmail.comwrote:
 
  On Fri, Nov 09, 2012 at 07:10:04PM +, Sears, Steven wrote:
   I have a memory subsystem design question that I'm hoping someone can
  answer.
  
   I've been looking at a machine that is completely out of memory, as in
  
v_free_count = 0,
v_cache_count = 0,
  
   I wondered how a machine could completely run out of memory like this,
  especially after finding a lack of interrupt storms or other pathologies
  that would tend to overcommit memory. So I started investigating.
  
   Most allocators come down to vm_page_alloc(), which has this guard:
  
 if ((curproc == pageproc)  (page_req != VM_ALLOC_INTERRUPT)) {
 page_req = VM_ALLOC_SYSTEM;
 };
  
 if (cnt.v_free_count + cnt.v_cache_count  cnt.v_free_reserved ||
 (page_req == VM_ALLOC_SYSTEM 
 cnt.v_free_count + cnt.v_cache_count 
  cnt.v_interrupt_free_min) ||
 (page_req == VM_ALLOC_INTERRUPT 
 cnt.v_free_count + cnt.v_cache_count  0)) {
  
   The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate
  every last page.
  
   From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare,
  perhaps only used from interrupt threads. Not so, see kmem_malloc() or
  uma_small_alloc() which both contain this mapping:
  
 if ((flags  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
 pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
 else
 pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;
  
   Note that M_USE_RESERVE has been deprecated and is used in just a
  handful of places. Also note that lots of code paths come through these
  routines.
  
   What this means is essentially _any_ allocation using M_NOWAIT will
  bypass whatever reserves have been held back and will take every last page
  available.
  
   There is no documentation stating M_NOWAIT has this side effect of
  essentially being privileged, so any innocuous piece of code that can't
  block will use it. And of course M_NOWAIT is literally used all over.
  
   It looks to me like the design goal of the BSD allocators is on
  recovery; it will give all pages away knowing it can recover.
  
   Am I missing anything? I would have expected some small number of pages
  to be held in reserve just in case. And I didn't expect M_NOWAIT to be a
  sort of back door for grabbing memory.
  
 
  Your analysis is right, there is nothing to add or correct.
  This is the reason to strongly prefer M_WAITOK.
 
 
 Agreed.  Once upon time, before SMPng, M_NOWAIT was rarely used.  It was
 well understand that it should only be used by interrupt handlers.
 
 The trouble is that M_NOWAIT conflates two orthogonal things.  The obvious
 being that the allocation shouldn't sleep.  The other being how far we're
 willing to deplete the cache/free page queues.
 
 When fine-grained locking got sprinkled throughout the kernel, we all to
 often found ourselves wanting to do allocations without the possibility of
 blocking.  So, M_NOWAIT became commonplace, where it wasn't before.
 
 This had the unintended consequence of introducing a lot of memory
 allocations in the top-half of the kernel, i.e., non-interrupt handling
 code, that were digging deep into the cache/free page queues.
 
 Also, ironically, in today's kernel an M_NOWAIT | M_USE_RESERVE
 allocation is less likely to succeed than an M_NOWAIT allocation.
 However, prior to FreeBSD 7.x, M_NOWAIT couldn't allocate a cached page; it
 could only allocate a free page.  M_USE_RESERVE said that it ok to allocate
 a cached page even though M_NOWAIT was specified.  Consequently, the system
 wouldn't dig as far into the free page queue if M_USE_RESERVE was
 specified, because it was allowed to reclaim a cached page.
 
 In conclusion, I think it's time that we change M_NOWAIT so that it doesn't
 dig any deeper into the cache/free page queues than M_WAITOK does and
 reintroduce a M_USE_RESERVE-like flag that says dig deep into the
 cache/free page queues.  The trouble is that we then need to identify all
 of those places that are implicitly depending on the current behavior of
 M_NOWAIT also digging deep into the cache/free page queues so that we can
 add an explicit M_USE_RESERVE.
 
 Alan
 
 P.S. I suspect that we should also increase the size of the page reserve
 that is kept for VM_ALLOC_INTERRUPT allocations in vm_page_alloc*().  How
 many legitimate users of a new M_USE_RESERVE-like flag in today's kernel
 could actually be satisfied by two pages?

I am almost sure that most of people who put the M_NOWAIT flag, do not
know the 'allow the deeper drain of free queue' effect. As such, I believe
we should flip the meaning of M_NOWAIT/M_USE_RESERVE. My only expectations
of the problematic places would be in the swapout path.

I found a single explicit use of M_USE_RESERVE in the kernel,
so the flip 

Re: Memory reserves or lack thereof

2012-11-12 Thread Peter Holm
On Mon, Nov 12, 2012 at 03:36:38PM +0200, Konstantin Belousov wrote:
 On Sun, Nov 11, 2012 at 03:40:24PM -0600, Alan Cox wrote:
  On Sat, Nov 10, 2012 at 7:20 AM, Konstantin Belousov 
  kostik...@gmail.comwrote:
  
   On Fri, Nov 09, 2012 at 07:10:04PM +, Sears, Steven wrote:
I have a memory subsystem design question that I'm hoping someone can
   answer.
   
I've been looking at a machine that is completely out of memory, as in
   
 v_free_count = 0,
 v_cache_count = 0,
   
I wondered how a machine could completely run out of memory like this,
   especially after finding a lack of interrupt storms or other pathologies
   that would tend to overcommit memory. So I started investigating.
   
Most allocators come down to vm_page_alloc(), which has this guard:
   
  if ((curproc == pageproc)  (page_req != VM_ALLOC_INTERRUPT)) {
  page_req = VM_ALLOC_SYSTEM;
  };
   
  if (cnt.v_free_count + cnt.v_cache_count  cnt.v_free_reserved ||
  (page_req == VM_ALLOC_SYSTEM 
  cnt.v_free_count + cnt.v_cache_count 
   cnt.v_interrupt_free_min) ||
  (page_req == VM_ALLOC_INTERRUPT 
  cnt.v_free_count + cnt.v_cache_count  0)) {
   
The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate
   every last page.
   
From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare,
   perhaps only used from interrupt threads. Not so, see kmem_malloc() or
   uma_small_alloc() which both contain this mapping:
   
  if ((flags  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
  pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
  else
  pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;
   
Note that M_USE_RESERVE has been deprecated and is used in just a
   handful of places. Also note that lots of code paths come through these
   routines.
   
What this means is essentially _any_ allocation using M_NOWAIT will
   bypass whatever reserves have been held back and will take every last page
   available.
   
There is no documentation stating M_NOWAIT has this side effect of
   essentially being privileged, so any innocuous piece of code that can't
   block will use it. And of course M_NOWAIT is literally used all over.
   
It looks to me like the design goal of the BSD allocators is on
   recovery; it will give all pages away knowing it can recover.
   
Am I missing anything? I would have expected some small number of pages
   to be held in reserve just in case. And I didn't expect M_NOWAIT to be a
   sort of back door for grabbing memory.
   
  
   Your analysis is right, there is nothing to add or correct.
   This is the reason to strongly prefer M_WAITOK.
  
  
  Agreed.  Once upon time, before SMPng, M_NOWAIT was rarely used.  It was
  well understand that it should only be used by interrupt handlers.
  
  The trouble is that M_NOWAIT conflates two orthogonal things.  The obvious
  being that the allocation shouldn't sleep.  The other being how far we're
  willing to deplete the cache/free page queues.
  
  When fine-grained locking got sprinkled throughout the kernel, we all to
  often found ourselves wanting to do allocations without the possibility of
  blocking.  So, M_NOWAIT became commonplace, where it wasn't before.
  
  This had the unintended consequence of introducing a lot of memory
  allocations in the top-half of the kernel, i.e., non-interrupt handling
  code, that were digging deep into the cache/free page queues.
  
  Also, ironically, in today's kernel an M_NOWAIT | M_USE_RESERVE
  allocation is less likely to succeed than an M_NOWAIT allocation.
  However, prior to FreeBSD 7.x, M_NOWAIT couldn't allocate a cached page; it
  could only allocate a free page.  M_USE_RESERVE said that it ok to allocate
  a cached page even though M_NOWAIT was specified.  Consequently, the system
  wouldn't dig as far into the free page queue if M_USE_RESERVE was
  specified, because it was allowed to reclaim a cached page.
  
  In conclusion, I think it's time that we change M_NOWAIT so that it doesn't
  dig any deeper into the cache/free page queues than M_WAITOK does and
  reintroduce a M_USE_RESERVE-like flag that says dig deep into the
  cache/free page queues.  The trouble is that we then need to identify all
  of those places that are implicitly depending on the current behavior of
  M_NOWAIT also digging deep into the cache/free page queues so that we can
  add an explicit M_USE_RESERVE.
  
  Alan
  
  P.S. I suspect that we should also increase the size of the page reserve
  that is kept for VM_ALLOC_INTERRUPT allocations in vm_page_alloc*().  How
  many legitimate users of a new M_USE_RESERVE-like flag in today's kernel
  could actually be satisfied by two pages?
 
 I am almost sure that most of people who put the M_NOWAIT flag, do not
 know the 'allow the deeper drain of free queue' effect. As such, I believe
 we should flip the meaning of 

Re: Memory reserves or lack thereof

2012-11-12 Thread Ian Lepore
On Mon, 2012-11-12 at 13:18 +0100, Andre Oppermann wrote:
  Well, what's the current set of best practices for allocating mbufs?
 
 If an allocation is driven by user space then you can use M_WAITOK.
 
 If an allocation is driven by the driver or kernel (callout and so on)
 you do M_NOWAIT and handle a failure by trying again later either
 directly by rescheduling the callout or by the upper layer retransmit
 logic.
 
 On top of that individual mbuf allocation or stitching mbufs and
 clusters together manually is deprecated.  If every possible you
 should use m_getm2().

root@pico:/root # man m_getm2
No manual entry for m_getm2

So when you say manually stitching mbufs together is deprecated, I take
you mean in the case where you're letting the mbuf routines allocate the
actual buffer space for you?

I've got an ethernet driver on an ARM SoC in which the hardware receives
into a series of buffers fixed at 128 bytes.  Right now the code is
allocating a cluster and then looping using m_append() to reassemble
these buffers back into a full contiguous frame in a cluster.  I was
going to have a shot at using MEXTADD() to manually string the series of
hardware/dma buffers together without copying the data.  Is that sort of
usage still a good idea?  (And would it actually be a performance win?
If I hand it off to the net stack and an m_pullup() or similar is going
to happen along the way anyway, I might as well do it at driver level.)

-- Ian


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-12 Thread Andre Oppermann

On 12.11.2012 15:47, Ian Lepore wrote:

On Mon, 2012-11-12 at 13:18 +0100, Andre Oppermann wrote:

Well, what's the current set of best practices for allocating mbufs?


If an allocation is driven by user space then you can use M_WAITOK.

If an allocation is driven by the driver or kernel (callout and so on)
you do M_NOWAIT and handle a failure by trying again later either
directly by rescheduling the callout or by the upper layer retransmit
logic.

On top of that individual mbuf allocation or stitching mbufs and
clusters together manually is deprecated.  If every possible you
should use m_getm2().


root@pico:/root # man m_getm2
No manual entry for m_getm2


Oops... Have to fix that.


So when you say manually stitching mbufs together is deprecated, I take
you mean in the case where you're letting the mbuf routines allocate the
actual buffer space for you?


I mean allocating an mbuf, a cluster and then stitching them together.
You can it in one with m_getcl().


I've got an ethernet driver on an ARM SoC in which the hardware receives
into a series of buffers fixed at 128 bytes.  Right now the code is
allocating a cluster and then looping using m_append() to reassemble
these buffers back into a full contiguous frame in a cluster.  I was
going to have a shot at using MEXTADD() to manually string the series of
hardware/dma buffers together without copying the data.  Is that sort of
usage still a good idea?  (And would it actually be a performance win?


That really depends on the particular usage.  Attaching the 128 byte
buffers to mbufs probably isn't much of a win considering an mbuf is
256 bytes in size.  You could just as well copy each 128 buf into the
data section.  Allocating a 2K cluster and copying into it is more
efficient on the overall system.


If I hand it off to the net stack and an m_pullup() or similar is going
to happen along the way anyway, I might as well do it at driver level.)


If you properly m_align() the mbuf cluster before you copy into it
there shouldn't be any m_pullup's happening.

--
Andre

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-12 Thread Alan Cox
On 11/12/2012 07:36, Konstantin Belousov wrote:
 On Sun, Nov 11, 2012 at 03:40:24PM -0600, Alan Cox wrote:
 On Sat, Nov 10, 2012 at 7:20 AM, Konstantin Belousov 
 kostik...@gmail.comwrote:

 On Fri, Nov 09, 2012 at 07:10:04PM +, Sears, Steven wrote:
 I have a memory subsystem design question that I'm hoping someone can
 answer.
 I've been looking at a machine that is completely out of memory, as in

  v_free_count = 0,
  v_cache_count = 0,

 I wondered how a machine could completely run out of memory like this,
 especially after finding a lack of interrupt storms or other pathologies
 that would tend to overcommit memory. So I started investigating.
 Most allocators come down to vm_page_alloc(), which has this guard:

   if ((curproc == pageproc)  (page_req != VM_ALLOC_INTERRUPT)) {
   page_req = VM_ALLOC_SYSTEM;
   };

   if (cnt.v_free_count + cnt.v_cache_count  cnt.v_free_reserved ||
   (page_req == VM_ALLOC_SYSTEM 
   cnt.v_free_count + cnt.v_cache_count 
 cnt.v_interrupt_free_min) ||
   (page_req == VM_ALLOC_INTERRUPT 
   cnt.v_free_count + cnt.v_cache_count  0)) {

 The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate
 every last page.
 From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare,
 perhaps only used from interrupt threads. Not so, see kmem_malloc() or
 uma_small_alloc() which both contain this mapping:
   if ((flags  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
   pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
   else
   pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;

 Note that M_USE_RESERVE has been deprecated and is used in just a
 handful of places. Also note that lots of code paths come through these
 routines.
 What this means is essentially _any_ allocation using M_NOWAIT will
 bypass whatever reserves have been held back and will take every last page
 available.
 There is no documentation stating M_NOWAIT has this side effect of
 essentially being privileged, so any innocuous piece of code that can't
 block will use it. And of course M_NOWAIT is literally used all over.
 It looks to me like the design goal of the BSD allocators is on
 recovery; it will give all pages away knowing it can recover.
 Am I missing anything? I would have expected some small number of pages
 to be held in reserve just in case. And I didn't expect M_NOWAIT to be a
 sort of back door for grabbing memory.
 Your analysis is right, there is nothing to add or correct.
 This is the reason to strongly prefer M_WAITOK.

 Agreed.  Once upon time, before SMPng, M_NOWAIT was rarely used.  It was
 well understand that it should only be used by interrupt handlers.

 The trouble is that M_NOWAIT conflates two orthogonal things.  The obvious
 being that the allocation shouldn't sleep.  The other being how far we're
 willing to deplete the cache/free page queues.

 When fine-grained locking got sprinkled throughout the kernel, we all to
 often found ourselves wanting to do allocations without the possibility of
 blocking.  So, M_NOWAIT became commonplace, where it wasn't before.

 This had the unintended consequence of introducing a lot of memory
 allocations in the top-half of the kernel, i.e., non-interrupt handling
 code, that were digging deep into the cache/free page queues.

 Also, ironically, in today's kernel an M_NOWAIT | M_USE_RESERVE
 allocation is less likely to succeed than an M_NOWAIT allocation.
 However, prior to FreeBSD 7.x, M_NOWAIT couldn't allocate a cached page; it
 could only allocate a free page.  M_USE_RESERVE said that it ok to allocate
 a cached page even though M_NOWAIT was specified.  Consequently, the system
 wouldn't dig as far into the free page queue if M_USE_RESERVE was
 specified, because it was allowed to reclaim a cached page.

 In conclusion, I think it's time that we change M_NOWAIT so that it doesn't
 dig any deeper into the cache/free page queues than M_WAITOK does and
 reintroduce a M_USE_RESERVE-like flag that says dig deep into the
 cache/free page queues.  The trouble is that we then need to identify all
 of those places that are implicitly depending on the current behavior of
 M_NOWAIT also digging deep into the cache/free page queues so that we can
 add an explicit M_USE_RESERVE.

 Alan

 P.S. I suspect that we should also increase the size of the page reserve
 that is kept for VM_ALLOC_INTERRUPT allocations in vm_page_alloc*().  How
 many legitimate users of a new M_USE_RESERVE-like flag in today's kernel
 could actually be satisfied by two pages?
 I am almost sure that most of people who put the M_NOWAIT flag, do not
 know the 'allow the deeper drain of free queue' effect. As such, I believe
 we should flip the meaning of M_NOWAIT/M_USE_RESERVE. My only expectations
 of the problematic places would be in the swapout path.

 I found a single explicit use of M_USE_RESERVE in the kernel,
 so the flip is relatively simple.

Agreed.  Most recently I eliminated several 

Re: Memory reserves or lack thereof

2012-11-12 Thread Alfred Perlstein


On Nov 12, 2012, at 4:11 AM, Andre Oppermann an...@freebsd.org wrote:
 
 
 I don't think many places depend on M_NOWAIT digging deep.  I'm
 perfectly happy with having M_NOWAIT give up on first try.  Only
 together with M_TRY_REALLY_HARD it would dig into reserves.
 
 PS: We have a really nasty namespace collision with the mbuf flags
 which use the M_* prefix as well.

Agreed. 

 


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-12 Thread Konstantin Belousov
On Mon, Nov 12, 2012 at 11:35:42AM -0600, Alan Cox wrote:
 Agreed.  Most recently I eliminated several uses from the arm pmap
 implementations.  There is, however, one other use:
 
 ofed/include/linux/gfp.h:#defineGFP_ATOMIC  (M_NOWAIT |
 M_USE_RESERVE)
Yes, I forgot to mention this. I have no idea about semantic  of
GFP_ATOMIC compat flag.

Below is the updated patch with two your notes applied.

diff --git a/sys/amd64/amd64/uma_machdep.c b/sys/amd64/amd64/uma_machdep.c
index dc9c307..ab1e869 100644
--- a/sys/amd64/amd64/uma_machdep.c
+++ b/sys/amd64/amd64/uma_machdep.c
@@ -29,6 +29,7 @@ __FBSDID($FreeBSD$);
 
 #include sys/param.h
 #include sys/lock.h
+#include sys/malloc.h
 #include sys/mutex.h
 #include sys/systm.h
 #include vm/vm.h
@@ -48,12 +49,7 @@ uma_small_alloc(uma_zone_t zone, int bytes, u_int8_t *flags, 
int wait)
int pflags;
 
*flags = UMA_SLAB_PRIV;
-   if ((wait  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
-   pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED;
-   else
-   pflags = VM_ALLOC_SYSTEM | VM_ALLOC_NOOBJ | VM_ALLOC_WIRED;
-   if (wait  M_ZERO)
-   pflags |= VM_ALLOC_ZERO;
+   pflags = m2vm_flags(wait, VM_ALLOC_NOOBJ | VM_ALLOC_WIRED);
for (;;) {
m = vm_page_alloc(NULL, 0, pflags);
if (m == NULL) {
diff --git a/sys/arm/arm/vm_machdep.c b/sys/arm/arm/vm_machdep.c
index f60cdb1..75366e3 100644
--- a/sys/arm/arm/vm_machdep.c
+++ b/sys/arm/arm/vm_machdep.c
@@ -651,12 +651,7 @@ uma_small_alloc(uma_zone_t zone, int bytes, u_int8_t 
*flags, int wait)
ret = ((void *)kmem_malloc(kmem_map, bytes, M_NOWAIT));
return (ret);
}
-   if ((wait  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
-   pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
-   else
-   pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;
-   if (wait  M_ZERO)
-   pflags |= VM_ALLOC_ZERO;
+   pflags = m2vm_flags(wait, VM_ALLOC_WIRED);
for (;;) {
m = vm_page_alloc(NULL, 0, pflags | VM_ALLOC_NOOBJ);
if (m == NULL) {
diff --git a/sys/fs/devfs/devfs_devs.c b/sys/fs/devfs/devfs_devs.c
index 71caa29..2ce1ca6 100644
--- a/sys/fs/devfs/devfs_devs.c
+++ b/sys/fs/devfs/devfs_devs.c
@@ -121,7 +121,7 @@ devfs_alloc(int flags)
struct cdev *cdev;
struct timespec ts;
 
-   cdp = malloc(sizeof *cdp, M_CDEVP, M_USE_RESERVE | M_ZERO |
+   cdp = malloc(sizeof *cdp, M_CDEVP, M_ZERO |
((flags  MAKEDEV_NOWAIT) ? M_NOWAIT : M_WAITOK));
if (cdp == NULL)
return (NULL);
diff --git a/sys/ia64/ia64/uma_machdep.c b/sys/ia64/ia64/uma_machdep.c
index 37353ff..9f77762 100644
--- a/sys/ia64/ia64/uma_machdep.c
+++ b/sys/ia64/ia64/uma_machdep.c
@@ -46,12 +46,7 @@ uma_small_alloc(uma_zone_t zone, int bytes, u_int8_t *flags, 
int wait)
int pflags;
 
*flags = UMA_SLAB_PRIV;
-   if ((wait  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
-   pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
-   else
-   pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;
-   if (wait  M_ZERO)
-   pflags |= VM_ALLOC_ZERO;
+   pflags = m2vm_flags(wait, VM_ALLOC_WIRED);
 
for (;;) {
m = vm_page_alloc(NULL, 0, pflags | VM_ALLOC_NOOBJ);
diff --git a/sys/mips/mips/uma_machdep.c b/sys/mips/mips/uma_machdep.c
index 798e632..24baef0 100644
--- a/sys/mips/mips/uma_machdep.c
+++ b/sys/mips/mips/uma_machdep.c
@@ -48,11 +48,7 @@ uma_small_alloc(uma_zone_t zone, int bytes, u_int8_t *flags, 
int wait)
void *va;
 
*flags = UMA_SLAB_PRIV;
-
-   if ((wait  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
-   pflags = VM_ALLOC_INTERRUPT;
-   else
-   pflags = VM_ALLOC_SYSTEM;
+   pflags = m2vm_flags(wait, 0);
 
for (;;) {
m = pmap_alloc_direct_page(0, pflags);
diff --git a/sys/powerpc/aim/mmu_oea64.c b/sys/powerpc/aim/mmu_oea64.c
index a491680..3e320b9 100644
--- a/sys/powerpc/aim/mmu_oea64.c
+++ b/sys/powerpc/aim/mmu_oea64.c
@@ -1369,12 +1369,7 @@ moea64_uma_page_alloc(uma_zone_t zone, int bytes, 
u_int8_t *flags, int wait)
*flags = UMA_SLAB_PRIV;
needed_lock = !PMAP_LOCKED(kernel_pmap);
 
-if ((wait  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
-pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
-else
-pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;
-if (wait  M_ZERO)
-pflags |= VM_ALLOC_ZERO;
+   pflags = m2vm_flags(wait, VM_ALLOC_WIRED);
 
 for (;;) {
 m = vm_page_alloc(NULL, 0, pflags | VM_ALLOC_NOOBJ);
diff --git a/sys/powerpc/aim/slb.c b/sys/powerpc/aim/slb.c
index 162c7fb..3882bfa 100644
--- a/sys/powerpc/aim/slb.c
+++ b/sys/powerpc/aim/slb.c
@@ -483,12 +483,7 @@ 

Re: Memory reserves or lack thereof

2012-11-12 Thread Sushanth Rai
This patch still doesn't address the issue of M_NOWAIT calls driving the memory 
the all the way down to 2 pages, right ? It would be nice to have M_NOWAIT just 
do non-sleep version of M_WAITOK and M_USE_RESERVE flag to dig deep. 

Sushanth 

--- On Mon, 11/12/12, Konstantin Belousov kostik...@gmail.com wrote:

 From: Konstantin Belousov kostik...@gmail.com
 Subject: Re: Memory reserves or lack thereof
 To: a...@freebsd.org
 Cc: p...@freebsd.org, Sears, Steven steven.se...@netapp.com, 
 freebsd-hackers@freebsd.org freebsd-hackers@freebsd.org
 Date: Monday, November 12, 2012, 5:36 AM
 On Sun, Nov 11, 2012 at 03:40:24PM
 -0600, Alan Cox wrote:
  On Sat, Nov 10, 2012 at 7:20 AM, Konstantin Belousov
 kostik...@gmail.comwrote:
  
   On Fri, Nov 09, 2012 at 07:10:04PM +, Sears,
 Steven wrote:
I have a memory subsystem design question
 that I'm hoping someone can
   answer.
   
I've been looking at a machine that is
 completely out of memory, as in
   
     v_free_count = 0,
     v_cache_count = 0,
   
I wondered how a machine could completely run
 out of memory like this,
   especially after finding a lack of interrupt
 storms or other pathologies
   that would tend to overcommit memory. So I started
 investigating.
   
Most allocators come down to vm_page_alloc(),
 which has this guard:
   
          if ((curproc
 == pageproc)  (page_req != VM_ALLOC_INTERRUPT)) {
              
    page_req = VM_ALLOC_SYSTEM;
          };
   
          if
 (cnt.v_free_count + cnt.v_cache_count 
 cnt.v_free_reserved ||
          
    (page_req == VM_ALLOC_SYSTEM 
          
    cnt.v_free_count + cnt.v_cache_count 
   cnt.v_interrupt_free_min) ||
          
    (page_req == VM_ALLOC_INTERRUPT
 
          
    cnt.v_free_count + cnt.v_cache_count 
 0)) {
   
The key observation is if VM_ALLOC_INTERRUPT
 is set, it will allocate
   every last page.
   
From the name one might expect
 VM_ALLOC_INTERRUPT to be somewhat rare,
   perhaps only used from interrupt threads. Not so,
 see kmem_malloc() or
   uma_small_alloc() which both contain this
 mapping:
   
          if ((flags
  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
              
    pflags = VM_ALLOC_INTERRUPT |
 VM_ALLOC_WIRED;
          else
              
    pflags = VM_ALLOC_SYSTEM |
 VM_ALLOC_WIRED;
   
Note that M_USE_RESERVE has been deprecated
 and is used in just a
   handful of places. Also note that lots of code
 paths come through these
   routines.
   
What this means is essentially _any_
 allocation using M_NOWAIT will
   bypass whatever reserves have been held back and
 will take every last page
   available.
   
There is no documentation stating M_NOWAIT
 has this side effect of
   essentially being privileged, so any innocuous
 piece of code that can't
   block will use it. And of course M_NOWAIT is
 literally used all over.
   
It looks to me like the design goal of the
 BSD allocators is on
   recovery; it will give all pages away knowing it
 can recover.
   
Am I missing anything? I would have expected
 some small number of pages
   to be held in reserve just in case. And I didn't
 expect M_NOWAIT to be a
   sort of back door for grabbing memory.
   
  
   Your analysis is right, there is nothing to add or
 correct.
   This is the reason to strongly prefer M_WAITOK.
  
  
  Agreed.  Once upon time, before SMPng, M_NOWAIT
 was rarely used.  It was
  well understand that it should only be used by
 interrupt handlers.
  
  The trouble is that M_NOWAIT conflates two orthogonal
 things.  The obvious
  being that the allocation shouldn't sleep.  The
 other being how far we're
  willing to deplete the cache/free page queues.
  
  When fine-grained locking got sprinkled throughout the
 kernel, we all to
  often found ourselves wanting to do allocations without
 the possibility of
  blocking.  So, M_NOWAIT became commonplace, where
 it wasn't before.
  
  This had the unintended consequence of introducing a
 lot of memory
  allocations in the top-half of the kernel, i.e.,
 non-interrupt handling
  code, that were digging deep into the cache/free page
 queues.
  
  Also, ironically, in today's kernel an M_NOWAIT |
 M_USE_RESERVE
  allocation is less likely to succeed than an M_NOWAIT
 allocation.
  However, prior to FreeBSD 7.x, M_NOWAIT couldn't
 allocate a cached page; it
  could only allocate a free page.  M_USE_RESERVE
 said that it ok to allocate
  a cached page even though M_NOWAIT was specified. 
 Consequently, the system
  wouldn't dig as far into the free page queue if
 M_USE_RESERVE was
  specified, because it was allowed to reclaim a cached
 page.
  
  In conclusion, I think it's time that we change
 M_NOWAIT so that it doesn't
  dig any deeper into the cache/free page queues than
 M_WAITOK does and
  reintroduce a M_USE_RESERVE-like flag that says dig
 deep into the
  cache/free page queues.  The trouble is that we
 then need to identify all
  of those places that are implicitly depending on the
 current

Re: Memory reserves or lack thereof

2012-11-12 Thread Konstantin Belousov
On Mon, Nov 12, 2012 at 01:28:02PM -0800, Sushanth Rai wrote:
 This patch still doesn't address the issue of M_NOWAIT calls driving
 the memory the all the way down to 2 pages, right ? It would be nice to
 have M_NOWAIT just do non-sleep version of M_WAITOK and M_USE_RESERVE
 flag to dig deep.

This is out of scope of the change. But it is required for any further
adjustements.


pgpHI7rQOhvFP.pgp
Description: PGP signature


Re: Memory reserves or lack thereof

2012-11-12 Thread Alan Cox

On 11/12/2012 3:48 PM, Konstantin Belousov wrote:

On Mon, Nov 12, 2012 at 01:28:02PM -0800, Sushanth Rai wrote:

This patch still doesn't address the issue of M_NOWAIT calls driving
the memory the all the way down to 2 pages, right ? It would be nice to
have M_NOWAIT just do non-sleep version of M_WAITOK and M_USE_RESERVE
flag to dig deep.

This is out of scope of the change. But it is required for any further
adjustements.


I would suggest a somewhat different response:

The patch does make M_NOWAIT into a non-sleep version of M_WAITOK and 
does reintroduce M_USE_RESERVE as a way to specify dig deep.


Currently, both M_NOWAIT and M_WAITOK can drive the cache/free memory 
down to two pages.  The effect of the patch is to stop M_NOWAIT at two 
pages rather than allowing it to continue to zero pages.


When you say, This is out of scope ..., I believe that you are 
referring to changing two pages into something larger.  I agree that 
this is out of scope for the current change.


Alan

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-12 Thread Adrian Chadd
.. wait, so what exactly would the difference be between M_NOWAIT and M_WAITOK?



adrian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-12 Thread Alan Cox

On 11/12/2012 5:24 PM, Adrian Chadd wrote:

.. wait, so what exactly would the difference be between M_NOWAIT and M_WAITOK?


Whether or not the allocation can sleep until memory becomes available.

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-12 Thread Adrian Chadd
On 12 November 2012 15:26, Alan Cox a...@rice.edu wrote:
 On 11/12/2012 5:24 PM, Adrian Chadd wrote:

 .. wait, so what exactly would the difference be between M_NOWAIT and
 M_WAITOK?


 Whether or not the allocation can sleep until memory becomes available.

Ok, so we're still maintaining that particular behaviour. Cool.



Adrian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-12 Thread Sushanth Rai


--- On Mon, 11/12/12, Alan Cox a...@rice.edu wrote:

 From: Alan Cox a...@rice.edu
 Subject: Re: Memory reserves or lack thereof
 To: Konstantin Belousov kostik...@gmail.com
 Cc: Sushanth Rai sushanth_...@yahoo.com, a...@freebsd.org, 
 p...@freebsd.org, StevenSears steven.se...@netapp.com, 
 freebsd-hackers@freebsd.org freebsd-hackers@freebsd.org
 Date: Monday, November 12, 2012, 3:10 PM
 On 11/12/2012 3:48 PM, Konstantin
 Belousov wrote:
  On Mon, Nov 12, 2012 at 01:28:02PM -0800, Sushanth Rai
 wrote:
  This patch still doesn't address the issue of
 M_NOWAIT calls driving
  the memory the all the way down to 2 pages, right ?
 It would be nice to
  have M_NOWAIT just do non-sleep version of M_WAITOK
 and M_USE_RESERVE
  flag to dig deep.
  This is out of scope of the change. But it is required
 for any further
  adjustements.
 
 I would suggest a somewhat different response:
 
 The patch does make M_NOWAIT into a non-sleep version of
 M_WAITOK and does reintroduce M_USE_RESERVE as a way to
 specify dig deep.
 
 Currently, both M_NOWAIT and M_WAITOK can drive the
 cache/free memory down to two pages.  The effect of the
 patch is to stop M_NOWAIT at two pages rather than allowing
 it to continue to zero pages.


Thanks for the correction. I was associating VM_ALLOC_SYSTEM with just M_NOWAIT 
as it seemed in the first verion of the patch.

Sushanth
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-12 Thread Julian Elischer

On 11/12/12 3:49 PM, Adrian Chadd wrote:

On 12 November 2012 15:26, Alan Cox a...@rice.edu wrote:

On 11/12/2012 5:24 PM, Adrian Chadd wrote:

.. wait, so what exactly would the difference be between M_NOWAIT and
M_WAITOK?


Whether or not the allocation can sleep until memory becomes available.

Ok, so we're still maintaining that particular behaviour. Cool.

no mem  | mem avail
--
M_WAITOK | wait, then success   |success   |
--
M_NOWAIT |  returns failure|success  |
--

the question is whether  the top left can ever fail for any other reason.




Adrian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org




___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-11 Thread Alan Cox
On Sat, Nov 10, 2012 at 7:20 AM, Konstantin Belousov kostik...@gmail.comwrote:

 On Fri, Nov 09, 2012 at 07:10:04PM +, Sears, Steven wrote:
  I have a memory subsystem design question that I'm hoping someone can
 answer.
 
  I've been looking at a machine that is completely out of memory, as in
 
   v_free_count = 0,
   v_cache_count = 0,
 
  I wondered how a machine could completely run out of memory like this,
 especially after finding a lack of interrupt storms or other pathologies
 that would tend to overcommit memory. So I started investigating.
 
  Most allocators come down to vm_page_alloc(), which has this guard:
 
if ((curproc == pageproc)  (page_req != VM_ALLOC_INTERRUPT)) {
page_req = VM_ALLOC_SYSTEM;
};
 
if (cnt.v_free_count + cnt.v_cache_count  cnt.v_free_reserved ||
(page_req == VM_ALLOC_SYSTEM 
cnt.v_free_count + cnt.v_cache_count 
 cnt.v_interrupt_free_min) ||
(page_req == VM_ALLOC_INTERRUPT 
cnt.v_free_count + cnt.v_cache_count  0)) {
 
  The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate
 every last page.
 
  From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare,
 perhaps only used from interrupt threads. Not so, see kmem_malloc() or
 uma_small_alloc() which both contain this mapping:
 
if ((flags  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
else
pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;
 
  Note that M_USE_RESERVE has been deprecated and is used in just a
 handful of places. Also note that lots of code paths come through these
 routines.
 
  What this means is essentially _any_ allocation using M_NOWAIT will
 bypass whatever reserves have been held back and will take every last page
 available.
 
  There is no documentation stating M_NOWAIT has this side effect of
 essentially being privileged, so any innocuous piece of code that can't
 block will use it. And of course M_NOWAIT is literally used all over.
 
  It looks to me like the design goal of the BSD allocators is on
 recovery; it will give all pages away knowing it can recover.
 
  Am I missing anything? I would have expected some small number of pages
 to be held in reserve just in case. And I didn't expect M_NOWAIT to be a
 sort of back door for grabbing memory.
 

 Your analysis is right, there is nothing to add or correct.
 This is the reason to strongly prefer M_WAITOK.


Agreed.  Once upon time, before SMPng, M_NOWAIT was rarely used.  It was
well understand that it should only be used by interrupt handlers.

The trouble is that M_NOWAIT conflates two orthogonal things.  The obvious
being that the allocation shouldn't sleep.  The other being how far we're
willing to deplete the cache/free page queues.

When fine-grained locking got sprinkled throughout the kernel, we all to
often found ourselves wanting to do allocations without the possibility of
blocking.  So, M_NOWAIT became commonplace, where it wasn't before.

This had the unintended consequence of introducing a lot of memory
allocations in the top-half of the kernel, i.e., non-interrupt handling
code, that were digging deep into the cache/free page queues.

Also, ironically, in today's kernel an M_NOWAIT | M_USE_RESERVE
allocation is less likely to succeed than an M_NOWAIT allocation.
However, prior to FreeBSD 7.x, M_NOWAIT couldn't allocate a cached page; it
could only allocate a free page.  M_USE_RESERVE said that it ok to allocate
a cached page even though M_NOWAIT was specified.  Consequently, the system
wouldn't dig as far into the free page queue if M_USE_RESERVE was
specified, because it was allowed to reclaim a cached page.

In conclusion, I think it's time that we change M_NOWAIT so that it doesn't
dig any deeper into the cache/free page queues than M_WAITOK does and
reintroduce a M_USE_RESERVE-like flag that says dig deep into the
cache/free page queues.  The trouble is that we then need to identify all
of those places that are implicitly depending on the current behavior of
M_NOWAIT also digging deep into the cache/free page queues so that we can
add an explicit M_USE_RESERVE.

Alan

P.S. I suspect that we should also increase the size of the page reserve
that is kept for VM_ALLOC_INTERRUPT allocations in vm_page_alloc*().  How
many legitimate users of a new M_USE_RESERVE-like flag in today's kernel
could actually be satisfied by two pages?
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-11 Thread Dieter BSD
Alan writes:
 In conclusion, I think it's time that we change M_NOWAIT so that it doesn't
 dig any deeper into the cache/free page queues than M_WAITOK does and
 reintroduce a M_USE_RESERVE-like flag that says dig deep into the
 cache/free page queues.  The trouble is that we then need to identify all
 of those places that are implicitly depending on the current behavior of
 M_NOWAIT also digging deep into the cache/free page queues so that we can
 add an explicit M_USE_RESERVE.

find /usr/src/sys | xargs grep M_NOWAIT | wc -l
2101

Sounds like a lot of work that would need to happen atomically.
Would this work:

M_NO_WAIT       do not sleep, do not dig deep unless M_USE_RESERVE also set
M_USE_RESERVE   dig deep
M_NOWAIT        M_NO_WAIT | M_USE_RESERVE (deprecated)

New code avoids using M_NOWAIT. Existing code continues working the same way.
As time permits, old code is converted to new flags. Eventually M_NOWAIT
goes away.

Pro: the amount of code that needs to change atomically is much smaller.

Con: (1) Have to remember (or look up) difference between M_NOWAIT
and M_NO_WAIT. Maybe calling the new flag M_NO_SLEEP would help?
(2) Would M_NOWAIT really ever go away? The spl() calls haven't,
even after some cage rattling.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Memory reserves or lack thereof

2012-11-11 Thread Adrian Chadd
On 11 November 2012 13:40, Alan Cox alan.l@gmail.com wrote:


 Agreed.  Once upon time, before SMPng, M_NOWAIT was rarely used.  It was
 well understand that it should only be used by interrupt handlers.

 The trouble is that M_NOWAIT conflates two orthogonal things.  The obvious
 being that the allocation shouldn't sleep.  The other being how far we're
 willing to deplete the cache/free page queues.

 When fine-grained locking got sprinkled throughout the kernel, we all to
 often found ourselves wanting to do allocations without the possibility of
 blocking.  So, M_NOWAIT became commonplace, where it wasn't before.

Well, what's the current set of best practices for allocating mbufs?

I don't mind going through ath(4) and net80211(4), looking to make it
behave better with mbuf allocations. There's 49 M_NOWAIT's in net80211
and 10 in ath(4). I wonder how many of them are synonyms with don't
fail allocating, too. Hm.


Adrian
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-11 Thread Alfred Perlstein
I think very few of the m_nowaits actually need the reserve behavior. We should 
probably switch away from it digging that deep by default and introduce a flag 
and/or a per thread flag to set the behavior. 

Sent from my iPhone

On Nov 11, 2012, at 4:32 PM, Dieter BSD dieter...@engineer.com wrote:

 Alan writes:
 In conclusion, I think it's time that we change M_NOWAIT so that it doesn't
 dig any deeper into the cache/free page queues than M_WAITOK does and
 reintroduce a M_USE_RESERVE-like flag that says dig deep into the
 cache/free page queues.  The trouble is that we then need to identify all
 of those places that are implicitly depending on the current behavior of
 M_NOWAIT also digging deep into the cache/free page queues so that we can
 add an explicit M_USE_RESERVE.
 
 find /usr/src/sys | xargs grep M_NOWAIT | wc -l
 2101
 
 Sounds like a lot of work that would need to happen atomically.
 Would this work:
 
 M_NO_WAIT   do not sleep, do not dig deep unless M_USE_RESERVE also set
 M_USE_RESERVE   dig deep
 M_NOWAITM_NO_WAIT | M_USE_RESERVE (deprecated)
 
 New code avoids using M_NOWAIT. Existing code continues working the same way.
 As time permits, old code is converted to new flags. Eventually M_NOWAIT
 goes away.
 
 Pro: the amount of code that needs to change atomically is much smaller.
 
 Con: (1) Have to remember (or look up) difference between M_NOWAIT
 and M_NO_WAIT. Maybe calling the new flag M_NO_SLEEP would help?
 (2) Would M_NOWAIT really ever go away? The spl() calls haven't,
 even after some cage rattling.
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory reserves or lack thereof

2012-11-10 Thread Konstantin Belousov
On Fri, Nov 09, 2012 at 07:10:04PM +, Sears, Steven wrote:
 I have a memory subsystem design question that I'm hoping someone can answer.
 
 I've been looking at a machine that is completely out of memory, as in
 
  v_free_count = 0, 
  v_cache_count = 0, 
 
 I wondered how a machine could completely run out of memory like this, 
 especially after finding a lack of interrupt storms or other pathologies that 
 would tend to overcommit memory. So I started investigating.
 
 Most allocators come down to vm_page_alloc(), which has this guard:
 
   if ((curproc == pageproc)  (page_req != VM_ALLOC_INTERRUPT)) {
   page_req = VM_ALLOC_SYSTEM;
   };
 
   if (cnt.v_free_count + cnt.v_cache_count  cnt.v_free_reserved ||
   (page_req == VM_ALLOC_SYSTEM  
   cnt.v_free_count + cnt.v_cache_count  cnt.v_interrupt_free_min) ||
   (page_req == VM_ALLOC_INTERRUPT 
   cnt.v_free_count + cnt.v_cache_count  0)) {
 
 The key observation is if VM_ALLOC_INTERRUPT is set, it will allocate every 
 last page.
 
 From the name one might expect VM_ALLOC_INTERRUPT to be somewhat rare, 
 perhaps only used from interrupt threads. Not so, see kmem_malloc() or 
 uma_small_alloc() which both contain this mapping:
 
   if ((flags  (M_NOWAIT|M_USE_RESERVE)) == M_NOWAIT)
   pflags = VM_ALLOC_INTERRUPT | VM_ALLOC_WIRED;
   else
   pflags = VM_ALLOC_SYSTEM | VM_ALLOC_WIRED;
 
 Note that M_USE_RESERVE has been deprecated and is used in just a handful of 
 places. Also note that lots of code paths come through these routines.
 
 What this means is essentially _any_ allocation using M_NOWAIT will bypass 
 whatever reserves have been held back and will take every last page available.
 
 There is no documentation stating M_NOWAIT has this side effect of 
 essentially being privileged, so any innocuous piece of code that can't block 
 will use it. And of course M_NOWAIT is literally used all over.
 
 It looks to me like the design goal of the BSD allocators is on recovery; it 
 will give all pages away knowing it can recover.
 
 Am I missing anything? I would have expected some small number of pages to be 
 held in reserve just in case. And I didn't expect M_NOWAIT to be a sort of 
 back door for grabbing memory.
 

Your analysis is right, there is nothing to add or correct.
This is the reason to strongly prefer M_WAITOK.


pgpXUAix5bcxa.pgp
Description: PGP signature