Re: [PATCH 2/2][v3] powerpc: Make the CMM memory hotplug aware

2009-10-12 Thread Robert Jennings
* Benjamin Herrenschmidt (b...@kernel.crashing.org) wrote:
 On Fri, 2009-10-09 at 15:41 -0500, Robert Jennings wrote:
  The Collaborative Memory Manager (CMM) module allocates individual pages
  over time that are not migratable.  On a long running system this can
  severely impact the ability to find enough pages to support a hotplug
  memory remove operation.
  
  This patch adds a memory isolation notifier and a memory hotplug notifier.
  The memory isolation notifier will return the number of pages found
  in the range specified.  This is used to determine if all of the used
  pages in a pageblock are owned by the balloon (or other entities in
  the notifier chain).  The hotplug notifier will free pages in the range
  which is to be removed.  The priority of this hotplug notifier is low
  so that it will be called near last, this helps avoids removing loaned
  pages in operations that fail due to other handlers.
  
  CMM activity will be halted when hotplug remove operations are active
  and resume activity after a delay period to allow the hypervisor time
  to adjust.
  
  Signed-off-by: Robert Jennings r...@linux.vnet.ibm.com
 
 Do you need me to merge that via the powerpc tree after the relevant
 generic parts go in ? This is 2.6.33 material ?

I didn't know how this part works honestly, this is the first time I've
pushed patches with dependencies like this.  Andrew Morton had pulled
an earlier version of this and the mm hotplug related changes for 2.6.32.

  +module_param_named(hotplug_delay, hotplug_delay, uint, S_IRUGO | S_IWUSR);
  +MODULE_PARM_DESC(delay, Delay (in seconds) after memory hotplug remove 
  +before activity resumes. 
  +[Default= __stringify(CMM_HOTPLUG_DELAY) ]);
 
 What is the above ? That sounds scary :-)

I'm changing this to read Delay (in seconds) after memory hotplug
remove before loaning resumes. in order to clear this us.  This is a
period where loaning from the balloon is paused after the hotplug
completes.  This gives the userspace tools time to mark the sections
as isolated and unusable with the hypervisor and the hypervisor to take
this into account regarding the loaning levels it requests of the OS.

   module_param_named(oom_kb, oom_kb, uint, S_IRUGO | S_IWUSR);
   MODULE_PARM_DESC(oom_kb, Amount of memory in kb to free on OOM. 
   [Default= __stringify(CMM_OOM_KB) ]);
  @@ -88,6 +101,8 @@ struct cmm_page_array {
   static unsigned long loaned_pages;
   static unsigned long loaned_pages_target;
   static unsigned long oom_freed_pages;
  +static atomic_t hotplug_active = ATOMIC_INIT(0);
  +static atomic_t hotplug_occurred = ATOMIC_INIT(0);
 
 That sounds like a hand made lock with atomics... rarely a good idea,
 tends to miss appropriate barriers etc...
 

I have changes this so that we have a mutex held during the memory
hotplug remove.  The hotplug_occurred flag is now and integer protected
by the mutex; it is used to provide the delay after the hotplug remove
completes.

   static struct cmm_page_array *cmm_page_list;
   static DEFINE_SPINLOCK(cmm_lock);
  @@ -110,6 +125,9 @@ static long cmm_alloc_pages(long nr)
  cmm_dbg(Begin request for %ld pages\n, nr);
   
  while (nr) {
  +   if (atomic_read(hotplug_active))
  +   break;
  +
 
 Ok so I'm not familiar with that whole memory hotplug stuff, so the code
 might be right, but wouldn't the above be racy anyways in case hotplug
 just becomes active after this statement ?
 
 Shouldn't you use a mutex_trylock instead ? That has clearer semantics
 and will provide the appropriate memory barriers.

I have changed this to use a mutex in the same location.  This allows
the allocation of pages to terminate early during a hotplug remove
operation.

If hotplug becomes active after this check in cmm_alloc_pages() one page
will be allocated to the balloon, but this page will not belong to the
memory range going offline because the pageblock will have already been
isolated.  After allocating the page we might need to wait on the lock to
add the page to the list if hotplug is removing pages from the balloon.
After one page is added, cmm_alloc_pages() will exit early when it checks
to see if hotplug is active or if it has occurred.

I wanted to keep the section locked by cmm_lock small, so that we can
safely respond to the hotplug notifier as quickly as possible while
minimizing memory pressure.

There are no checks in cmm_free_pages() to have it abort early during
hotplug memory remove with the rationale that it's good for hotplug
to allow the balloon to shrink even if it means holding the cmm_lock a
bit longer.

  addr = __get_free_page(GFP_NOIO | __GFP_NOWARN |
 __GFP_NORETRY | __GFP_NOMEMALLOC);
  if (!addr)
  @@ -119,8 +137,10 @@ static long cmm_alloc_pages(long nr)
  if (!pa || pa-index = CMM_NR_PAGES) {
  /* Need a new page for the page list. */

Re: [PATCH 2/2][v3] powerpc: Make the CMM memory hotplug aware

2009-10-11 Thread Benjamin Herrenschmidt
On Fri, 2009-10-09 at 15:41 -0500, Robert Jennings wrote:
 The Collaborative Memory Manager (CMM) module allocates individual pages
 over time that are not migratable.  On a long running system this can
 severely impact the ability to find enough pages to support a hotplug
 memory remove operation.
 
 This patch adds a memory isolation notifier and a memory hotplug notifier.
 The memory isolation notifier will return the number of pages found
 in the range specified.  This is used to determine if all of the used
 pages in a pageblock are owned by the balloon (or other entities in
 the notifier chain).  The hotplug notifier will free pages in the range
 which is to be removed.  The priority of this hotplug notifier is low
 so that it will be called near last, this helps avoids removing loaned
 pages in operations that fail due to other handlers.
 
 CMM activity will be halted when hotplug remove operations are active
 and resume activity after a delay period to allow the hypervisor time
 to adjust.
 
 Signed-off-by: Robert Jennings r...@linux.vnet.ibm.com

Do you need me to merge that via the powerpc tree after the relevant
generic parts go in ? This is 2.6.33 material ?

 +module_param_named(hotplug_delay, hotplug_delay, uint, S_IRUGO | S_IWUSR);
 +MODULE_PARM_DESC(delay, Delay (in seconds) after memory hotplug remove 
 +  before activity resumes. 
 +  [Default= __stringify(CMM_HOTPLUG_DELAY) ]);

What is the above ? That sounds scary :-)

  module_param_named(oom_kb, oom_kb, uint, S_IRUGO | S_IWUSR);
  MODULE_PARM_DESC(oom_kb, Amount of memory in kb to free on OOM. 
[Default= __stringify(CMM_OOM_KB) ]);
 @@ -88,6 +101,8 @@ struct cmm_page_array {
  static unsigned long loaned_pages;
  static unsigned long loaned_pages_target;
  static unsigned long oom_freed_pages;
 +static atomic_t hotplug_active = ATOMIC_INIT(0);
 +static atomic_t hotplug_occurred = ATOMIC_INIT(0);

That sounds like a hand made lock with atomics... rarely a good idea,
tends to miss appropriate barriers etc...
 
  static struct cmm_page_array *cmm_page_list;
  static DEFINE_SPINLOCK(cmm_lock);
 @@ -110,6 +125,9 @@ static long cmm_alloc_pages(long nr)
   cmm_dbg(Begin request for %ld pages\n, nr);
  
   while (nr) {
 + if (atomic_read(hotplug_active))
 + break;
 +

Ok so I'm not familiar with that whole memory hotplug stuff, so the code
might be right, but wouldn't the above be racy anyways in case hotplug
just becomes active after this statement ?

Shouldn't you use a mutex_trylock instead ? That has clearer semantics
and will provide the appropriate memory barriers.

   addr = __get_free_page(GFP_NOIO | __GFP_NOWARN |
  __GFP_NORETRY | __GFP_NOMEMALLOC);
   if (!addr)
 @@ -119,8 +137,10 @@ static long cmm_alloc_pages(long nr)
   if (!pa || pa-index = CMM_NR_PAGES) {
   /* Need a new page for the page list. */
   spin_unlock(cmm_lock);
 - npa = (struct cmm_page_array *)__get_free_page(GFP_NOIO 
 | __GFP_NOWARN |
 -
 __GFP_NORETRY | __GFP_NOMEMALLOC);
 + npa = (struct cmm_page_array *)__get_free_page(
 + GFP_NOIO | __GFP_NOWARN |
 + __GFP_NORETRY | __GFP_NOMEMALLOC |
 + __GFP_MOVABLE);
   if (!npa) {
   pr_info(%s: Can not allocate new page list\n, 
 __func__);
   free_page(addr);
 @@ -273,9 +293,23 @@ static int cmm_thread(void *dummy)
   while (1) {
   timeleft = msleep_interruptible(delay * 1000);
 
 - if (kthread_should_stop() || timeleft) {
 - loaned_pages_target = loaned_pages;
 + if (kthread_should_stop() || timeleft)
   break;
 +
 + if (atomic_read(hotplug_active)) {
 + cmm_dbg(Hotplug operation in progress, activity 
 + suspended\n);
 + continue;
 + }
 +
 + if (atomic_dec_if_positive(hotplug_occurred) = 0) {
 + cmm_dbg(Hotplug operation has occurred, loaning 
 + activity suspended for %d seconds.\n,
 + hotplug_delay);
 + timeleft = msleep_interruptible(hotplug_delay * 1000);
 + if (kthread_should_stop() || timeleft)
 + break;
 + continue;
   }

I have less problems with hotplug_occured but if you use a
mutex_trylock, overall, you can turn the above into a normal int instead
of an atomic.

 ../..

 +static int cmm_memory_cb(struct notifier_block *self,
 + 

[PATCH 2/2][v3] powerpc: Make the CMM memory hotplug aware

2009-10-09 Thread Robert Jennings
The Collaborative Memory Manager (CMM) module allocates individual pages
over time that are not migratable.  On a long running system this can
severely impact the ability to find enough pages to support a hotplug
memory remove operation.

This patch adds a memory isolation notifier and a memory hotplug notifier.
The memory isolation notifier will return the number of pages found
in the range specified.  This is used to determine if all of the used
pages in a pageblock are owned by the balloon (or other entities in
the notifier chain).  The hotplug notifier will free pages in the range
which is to be removed.  The priority of this hotplug notifier is low
so that it will be called near last, this helps avoids removing loaned
pages in operations that fail due to other handlers.

CMM activity will be halted when hotplug remove operations are active
and resume activity after a delay period to allow the hypervisor time
to adjust.

Signed-off-by: Robert Jennings r...@linux.vnet.ibm.com

---

I'm not entirely sure of the ettiquette, but there are no changes to
this patch.  I'm resending to keep it with the changes to the parent
patch.
 
 arch/powerpc/platforms/pseries/cmm.c |  207 ++-
 1 file changed, 201 insertions(+), 6 deletions(-)

Index: b/arch/powerpc/platforms/pseries/cmm.c
===
--- a/arch/powerpc/platforms/pseries/cmm.c
+++ b/arch/powerpc/platforms/pseries/cmm.c
@@ -38,19 +38,28 @@
 #include asm/mmu.h
 #include asm/pgalloc.h
 #include asm/uaccess.h
+#include linux/memory.h
 
 #include plpar_wrappers.h
 
 #define CMM_DRIVER_VERSION 1.0.0
 #define CMM_DEFAULT_DELAY  1
+#define CMM_HOTPLUG_DELAY  5
 #define CMM_DEBUG  0
 #define CMM_DISABLE0
 #define CMM_OOM_KB 1024
 #define CMM_MIN_MEM_MB 256
 #define KB2PAGES(_p)   ((_p)(PAGE_SHIFT-10))
 #define PAGES2KB(_p)   ((_p)(PAGE_SHIFT-10))
+/*
+ * The priority level tries to ensure that this notifier is called as
+ * late as possible to reduce thrashing in the shared memory pool.
+ */
+#define CMM_MEM_HOTPLUG_PRI1
+#define CMM_MEM_ISOLATE_PRI15
 
 static unsigned int delay = CMM_DEFAULT_DELAY;
+static unsigned int hotplug_delay = CMM_HOTPLUG_DELAY;
 static unsigned int oom_kb = CMM_OOM_KB;
 static unsigned int cmm_debug = CMM_DEBUG;
 static unsigned int cmm_disabled = CMM_DISABLE;
@@ -65,6 +74,10 @@ MODULE_VERSION(CMM_DRIVER_VERSION);
 module_param_named(delay, delay, uint, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(delay, Delay (in seconds) between polls to query hypervisor 
paging requests. 
 [Default= __stringify(CMM_DEFAULT_DELAY) ]);
+module_param_named(hotplug_delay, hotplug_delay, uint, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(delay, Delay (in seconds) after memory hotplug remove 
+before activity resumes. 
+[Default= __stringify(CMM_HOTPLUG_DELAY) ]);
 module_param_named(oom_kb, oom_kb, uint, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(oom_kb, Amount of memory in kb to free on OOM. 
 [Default= __stringify(CMM_OOM_KB) ]);
@@ -88,6 +101,8 @@ struct cmm_page_array {
 static unsigned long loaned_pages;
 static unsigned long loaned_pages_target;
 static unsigned long oom_freed_pages;
+static atomic_t hotplug_active = ATOMIC_INIT(0);
+static atomic_t hotplug_occurred = ATOMIC_INIT(0);
 
 static struct cmm_page_array *cmm_page_list;
 static DEFINE_SPINLOCK(cmm_lock);
@@ -110,6 +125,9 @@ static long cmm_alloc_pages(long nr)
cmm_dbg(Begin request for %ld pages\n, nr);
 
while (nr) {
+   if (atomic_read(hotplug_active))
+   break;
+
addr = __get_free_page(GFP_NOIO | __GFP_NOWARN |
   __GFP_NORETRY | __GFP_NOMEMALLOC);
if (!addr)
@@ -119,8 +137,10 @@ static long cmm_alloc_pages(long nr)
if (!pa || pa-index = CMM_NR_PAGES) {
/* Need a new page for the page list. */
spin_unlock(cmm_lock);
-   npa = (struct cmm_page_array *)__get_free_page(GFP_NOIO 
| __GFP_NOWARN |
-  
__GFP_NORETRY | __GFP_NOMEMALLOC);
+   npa = (struct cmm_page_array *)__get_free_page(
+   GFP_NOIO | __GFP_NOWARN |
+   __GFP_NORETRY | __GFP_NOMEMALLOC |
+   __GFP_MOVABLE);
if (!npa) {
pr_info(%s: Can not allocate new page list\n, 
__func__);
free_page(addr);
@@ -273,9 +293,23 @@ static int cmm_thread(void *dummy)
while (1) {
timeleft = msleep_interruptible(delay * 1000);
 
-   if (kthread_should_stop() || timeleft) {
-   loaned_pages_target =