[RFC][PATCH][3/4] Add reclaim support (v2)

2007-02-25 Thread Balbir Singh


Changelog

1. Move void *container to struct container (in scan_control and vmscan.c
   and rmap.c)
2. The last set of patches churned the LRU list, in this release, pages
   that can do not belong to the container are moved to a skipped_pages
   list. At the end of the isolation they are added back to the zone
   list using list_spice_tail (a new function added in list.h).
   The disadvantage of this approach is that pages moved to skipped_pages
   will not be available for general reclaim. General testing on UML
   and a powerpc box showed that the changes worked.

   Other alternatives tried
   
   a. Do not delete the page from lru list, but that quickly lead to
  a panic, since the page was on LRU and we released the lru_lock
  in page_in_container

TODO's

1. Try a per-container LRU list, but that would mean expanding the page
   struct or special tricks like overloading the LRU pointer. A per-container
   list would also make it more difficult to handle shared pages, as a
   page will belong to just one container at-a-time.

This patch reclaims pages from a container when the container limit is hit.
The executable is oom'ed only when the container it is running in, is overlimit
and we could not reclaim any pages belonging to the container

A parameter called pushback, controls how much memory is reclaimed when the
limit is hit. It should be easy to expose this knob to user space, but
currently it is hard coded to 20% of the total limit of the container.

isolate_lru_pages() has been modified to isolate pages belonging to a
particular container, so that reclaim code will reclaim only container
pages. For shared pages, reclaim does not unmap all mappings of the page,
it only unmaps those mappings that are over their limit. This ensures
that other containers are not penalized while reclaiming shared pages.

Parallel reclaim per container is not allowed. Each controller has a wait
queue that ensures that only one task per control is running reclaim on
that container.

Signed-off-by: <[EMAIL PROTECTED]>
---

 include/linux/list.h   |   26 +
 include/linux/memcontrol.h |   12 
 include/linux/rmap.h   |   20 ++-
 include/linux/swap.h   |3 +
 mm/memcontrol.c|  122 +
 mm/migrate.c   |2 
 mm/rmap.c  |  100 +++-
 mm/vmscan.c|  114 +-
 8 files changed, 370 insertions(+), 29 deletions(-)

diff -puN include/linux/memcontrol.h~memcontrol-reclaim-on-limit 
include/linux/memcontrol.h
--- linux-2.6.20/include/linux/memcontrol.h~memcontrol-reclaim-on-limit 
2007-02-24 19:40:56.0 +0530
+++ linux-2.6.20-balbir/include/linux/memcontrol.h  2007-02-24 
19:50:34.0 +0530
@@ -37,6 +37,7 @@ enum {
 };
 
 #ifdef CONFIG_CONTAINER_MEMCONTROL
+#include 
 
 #ifndef kB
 #define kB 1024/* One Kilo Byte */
@@ -53,6 +54,9 @@ extern void memcontrol_mm_free(struct mm
 extern void memcontrol_mm_assign_container(struct mm_struct *mm,
struct task_struct *p);
 extern int memcontrol_update_rss(struct mm_struct *mm, int count, bool check);
+extern int memcontrol_mm_overlimit(struct mm_struct *mm, void *sc_cont);
+extern wait_queue_head_t memcontrol_reclaim_wq;
+extern bool memcontrol_reclaim_in_progress;
 
 #else /* CONFIG_CONTAINER_MEMCONTROL  */
 
@@ -76,5 +80,13 @@ static inline int memcontrol_update_rss(
return 0;
 }
 
+/*
+ * In the absence of memory control, we always free mappings.
+ */
+static inline int memcontrol_mm_overlimit(struct mm_struct *mm, void *sc_cont)
+{
+   return 1;
+}
+
 #endif /* CONFIG_CONTAINER_MEMCONTROL */
 #endif /* _LINUX_MEMCONTROL_H */
diff -puN include/linux/rmap.h~memcontrol-reclaim-on-limit include/linux/rmap.h
--- linux-2.6.20/include/linux/rmap.h~memcontrol-reclaim-on-limit   
2007-02-24 19:40:56.0 +0530
+++ linux-2.6.20-balbir/include/linux/rmap.h2007-02-24 19:40:56.0 
+0530
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * The anon_vma heads a list of private "related" vmas, to scan if
@@ -90,7 +91,17 @@ static inline void page_dup_rmap(struct 
  * Called from mm/vmscan.c to handle paging out
  */
 int page_referenced(struct page *, int is_locked);
-int try_to_unmap(struct page *, int ignore_refs);
+int try_to_unmap(struct page *, int ignore_refs, struct container *container);
+#ifdef CONFIG_CONTAINER_MEMCONTROL
+bool page_in_container(struct page *page, struct zone *zone,
+   struct container *container);
+#else
+static inline bool page_in_container(struct page *page, struct zone *zone,
+   struct container *container)
+{
+   return true;
+}
+#endif /* CONFIG_CONTAINER_MEMCONTROL */
 
 /*
  * Called from mm/filemap_xip.c to unmap empty zero page
@@ -118,7 +129,1

[RFC][PATCH][3/4] Add reclaim support (

2007-02-24 Thread Balbir Singh


Changelog

1. Move void *container to struct container (in scan_control and vmscan.c
   and rmap.c)
2. The last set of patches churned the LRU list, in this release, pages
   that can do not belong to the container are moved to a skipped_pages
   list. At the end of the isolation they are added back to the zone
   list using list_spice_tail (a new function added in list.h).
   The disadvantage of this approach is that pages moved to skipped_pages
   will not be available for general reclaim. General testing on UML
   and a powerpc box showed that the changes worked.

   Other alternatives tried
   
   a. Do not delete the page from lru list, but that quickly lead to
  a panic, since the page was on LRU and we released the lru_lock
  in page_in_container

TODO's

1. Try a per-container LRU list, but that would mean expanding the page
   struct or special tricks like overloading the LRU pointer. A per-container
   list would also make it more difficult to handle shared pages, as a
   page will belong to just one container at-a-time.

This patch reclaims pages from a container when the container limit is hit.
The executable is oom'ed only when the container it is running in, is overlimit
and we could not reclaim any pages belonging to the container

A parameter called pushback, controls how much memory is reclaimed when the
limit is hit. It should be easy to expose this knob to user space, but
currently it is hard coded to 20% of the total limit of the container.

isolate_lru_pages() has been modified to isolate pages belonging to a
particular container, so that reclaim code will reclaim only container
pages. For shared pages, reclaim does not unmap all mappings of the page,
it only unmaps those mappings that are over their limit. This ensures
that other containers are not penalized while reclaiming shared pages.

Parallel reclaim per container is not allowed. Each controller has a wait
queue that ensures that only one task per control is running reclaim on
that container.

Signed-off-by: <[EMAIL PROTECTED]>
---

 include/linux/list.h   |   26 +
 include/linux/memcontrol.h |   12 
 include/linux/rmap.h   |   20 ++-
 include/linux/swap.h   |3 +
 mm/memcontrol.c|  122 +
 mm/migrate.c   |2 
 mm/rmap.c  |  100 +++-
 mm/vmscan.c|  114 +-
 8 files changed, 370 insertions(+), 29 deletions(-)

diff -puN include/linux/memcontrol.h~memcontrol-reclaim-on-limit 
include/linux/memcontrol.h
--- linux-2.6.20/include/linux/memcontrol.h~memcontrol-reclaim-on-limit 
2007-02-24 19:40:56.0 +0530
+++ linux-2.6.20-balbir/include/linux/memcontrol.h  2007-02-24 
19:50:34.0 +0530
@@ -37,6 +37,7 @@ enum {
 };
 
 #ifdef CONFIG_CONTAINER_MEMCONTROL
+#include 
 
 #ifndef kB
 #define kB 1024/* One Kilo Byte */
@@ -53,6 +54,9 @@ extern void memcontrol_mm_free(struct mm
 extern void memcontrol_mm_assign_container(struct mm_struct *mm,
struct task_struct *p);
 extern int memcontrol_update_rss(struct mm_struct *mm, int count, bool check);
+extern int memcontrol_mm_overlimit(struct mm_struct *mm, void *sc_cont);
+extern wait_queue_head_t memcontrol_reclaim_wq;
+extern bool memcontrol_reclaim_in_progress;
 
 #else /* CONFIG_CONTAINER_MEMCONTROL  */
 
@@ -76,5 +80,13 @@ static inline int memcontrol_update_rss(
return 0;
 }
 
+/*
+ * In the absence of memory control, we always free mappings.
+ */
+static inline int memcontrol_mm_overlimit(struct mm_struct *mm, void *sc_cont)
+{
+   return 1;
+}
+
 #endif /* CONFIG_CONTAINER_MEMCONTROL */
 #endif /* _LINUX_MEMCONTROL_H */
diff -puN include/linux/rmap.h~memcontrol-reclaim-on-limit include/linux/rmap.h
--- linux-2.6.20/include/linux/rmap.h~memcontrol-reclaim-on-limit   
2007-02-24 19:40:56.0 +0530
+++ linux-2.6.20-balbir/include/linux/rmap.h2007-02-24 19:40:56.0 
+0530
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * The anon_vma heads a list of private "related" vmas, to scan if
@@ -90,7 +91,17 @@ static inline void page_dup_rmap(struct 
  * Called from mm/vmscan.c to handle paging out
  */
 int page_referenced(struct page *, int is_locked);
-int try_to_unmap(struct page *, int ignore_refs);
+int try_to_unmap(struct page *, int ignore_refs, struct container *container);
+#ifdef CONFIG_CONTAINER_MEMCONTROL
+bool page_in_container(struct page *page, struct zone *zone,
+   struct container *container);
+#else
+static inline bool page_in_container(struct page *page, struct zone *zone,
+   struct container *container)
+{
+   return true;
+}
+#endif /* CONFIG_CONTAINER_MEMCONTROL */
 
 /*
  * Called from mm/filemap_xip.c to unmap empty zero page
@@ -118,7 +129,1

Re: [RFC][PATCH][3/4] Add reclaim support

2007-02-19 Thread Balbir Singh

Andrew Morton wrote:

On Mon, 19 Feb 2007 16:20:53 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:


+ * so, is the container over it's limit. Returns 1 if the container is above
+ * its limit.
+ */
+int memctlr_mm_overlimit(struct mm_struct *mm, void *sc_cont)
+{
+   struct container *cont;
+   struct memctlr *mem;
+   long usage, limit;
+   int ret = 1;
+
+   if (!sc_cont)
+   goto out;
+
+   read_lock(&mm->container_lock);
+   cont = mm->container;
+
+   /*
+* Regular reclaim, let it proceed as usual
+*/
+   if (!sc_cont)
+   goto out;
+
+   ret = 0;
+   if (cont != sc_cont)
+   goto out;
+
+   mem = memctlr_from_cont(cont);
+   usage = atomic_long_read(&mem->counter.usage);
+   limit = atomic_long_read(&mem->counter.limit);
+   if (limit && (usage > limit))
+   ret = 1;
+out:
+   read_unlock(&mm->container_lock);
+   return ret;
+}

hm, I wonder how much additional lock traffic all this adds.


It's a read_lock() and most of the locks are read_locks
which allow for concurrent access, until the container
changes or goes away


read_lock isn't free, and I suspect we're calling this function pretty
often (every pagefault?) It'll be measurable on some workloads, on some
hardware.

It probably won't be terribly bad because each lock-taking is associated
with a clear_page().  But still, if there's any possibility of lightening
the locking up, now is the time to think about it.



Yes, good point. I'll revisit to see if barriers can replace the locking
or if the locking is required at all?


@@ -66,6 +67,9 @@ struct scan_control {
int swappiness;
 
 	int all_unreclaimable;

+
+   void *container;/* Used by containers for reclaiming */
+   /* pages when the limit is exceeded  */
 };

eww.  Why void*?


I did not want to expose struct container in mm/vmscan.c.


It's already there, via rmap.h



Yes, true


An additional
thought was that no matter what container goes in the field would be
useful for reclaim.


Am having trouble parsing that sentence ;)




The thought was that irrespective of the infrastructure that goes in
having an entry for reclaim in scan_control would be useful. I guess
the name exposes what the type tries to hide :-)

--
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH][3/4] Add reclaim support

2007-02-19 Thread Andrew Morton
On Mon, 19 Feb 2007 16:20:53 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:

> >> + * so, is the container over it's limit. Returns 1 if the container is 
> >> above
> >> + * its limit.
> >> + */
> >> +int memctlr_mm_overlimit(struct mm_struct *mm, void *sc_cont)
> >> +{
> >> +  struct container *cont;
> >> +  struct memctlr *mem;
> >> +  long usage, limit;
> >> +  int ret = 1;
> >> +
> >> +  if (!sc_cont)
> >> +  goto out;
> >> +
> >> +  read_lock(&mm->container_lock);
> >> +  cont = mm->container;
> >> +
> >> +  /*
> >> +   * Regular reclaim, let it proceed as usual
> >> +   */
> >> +  if (!sc_cont)
> >> +  goto out;
> >> +
> >> +  ret = 0;
> >> +  if (cont != sc_cont)
> >> +  goto out;
> >> +
> >> +  mem = memctlr_from_cont(cont);
> >> +  usage = atomic_long_read(&mem->counter.usage);
> >> +  limit = atomic_long_read(&mem->counter.limit);
> >> +  if (limit && (usage > limit))
> >> +  ret = 1;
> >> +out:
> >> +  read_unlock(&mm->container_lock);
> >> +  return ret;
> >> +}
> > 
> > hm, I wonder how much additional lock traffic all this adds.
> > 
> 
> It's a read_lock() and most of the locks are read_locks
> which allow for concurrent access, until the container
> changes or goes away

read_lock isn't free, and I suspect we're calling this function pretty
often (every pagefault?) It'll be measurable on some workloads, on some
hardware.

It probably won't be terribly bad because each lock-taking is associated
with a clear_page().  But still, if there's any possibility of lightening
the locking up, now is the time to think about it.

> >> @@ -66,6 +67,9 @@ struct scan_control {
> >>int swappiness;
> >>  
> >>int all_unreclaimable;
> >> +
> >> +  void *container;/* Used by containers for reclaiming */
> >> +  /* pages when the limit is exceeded  */
> >>  };
> > 
> > eww.  Why void*?
> > 
> 
> I did not want to expose struct container in mm/vmscan.c.

It's already there, via rmap.h

> An additional
> thought was that no matter what container goes in the field would be
> useful for reclaim.

Am having trouble parsing that sentence ;)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH][3/4] Add reclaim support

2007-02-19 Thread Balbir Singh

KAMEZAWA Hiroyuki wrote:

On Mon, 19 Feb 2007 12:20:42 +0530
Balbir Singh <[EMAIL PROTECTED]> wrote:


+int memctlr_mm_overlimit(struct mm_struct *mm, void *sc_cont)
+{
+   struct container *cont;
+   struct memctlr *mem;
+   long usage, limit;
+   int ret = 1;
+
+   if (!sc_cont)
+   goto out;
+
+   read_lock(&mm->container_lock);
+   cont = mm->container;



+out:
+   read_unlock(&mm->container_lock);
+   return ret;
+}
+


should be
==
out_and_unlock:
read_unlock(&mm->container_lock);
out_:
return ret;




Thanks, that's a much convention!



-Kame




--
Warm Regards,
Balbir Singh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH][3/4] Add reclaim support

2007-02-19 Thread Balbir Singh

Andrew Morton wrote:

On Mon, 19 Feb 2007 12:20:42 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:


This patch reclaims pages from a container when the container limit is hit.
The executable is oom'ed only when the container it is running in, is overlimit
and we could not reclaim any pages belonging to the container

A parameter called pushback, controls how much memory is reclaimed when the
limit is hit. It should be easy to expose this knob to user space, but
currently it is hard coded to 20% of the total limit of the container.

isolate_lru_pages() has been modified to isolate pages belonging to a
particular container, so that reclaim code will reclaim only container
pages. For shared pages, reclaim does not unmap all mappings of the page,
it only unmaps those mappings that are over their limit. This ensures
that other containers are not penalized while reclaiming shared pages.

Parallel reclaim per container is not allowed. Each controller has a wait
queue that ensures that only one task per control is running reclaim on
that container.


...

--- linux-2.6.20/include/linux/rmap.h~memctlr-reclaim-on-limit  2007-02-18 
23:29:14.0 +0530
+++ linux-2.6.20-balbir/include/linux/rmap.h2007-02-18 23:29:14.0 
+0530
@@ -90,7 +90,15 @@ static inline void page_dup_rmap(struct 
  * Called from mm/vmscan.c to handle paging out

  */
 int page_referenced(struct page *, int is_locked);
-int try_to_unmap(struct page *, int ignore_refs);
+int try_to_unmap(struct page *, int ignore_refs, void *container);
+#ifdef CONFIG_CONTAINER_MEMCTLR
+bool page_in_container(struct page *page, struct zone *zone, void *container);
+#else
+static inline bool page_in_container(struct page *page, struct zone *zone, 
void *container)
+{
+   return true;
+}
+#endif /* CONFIG_CONTAINER_MEMCTLR */
 
 /*

  * Called from mm/filemap_xip.c to unmap empty zero page
@@ -118,7 +126,8 @@ int page_mkclean(struct page *);
 #define anon_vma_link(vma) do {} while (0)
 
 #define page_referenced(page,l) TestClearPageReferenced(page)

-#define try_to_unmap(page, refs) SWAP_FAIL
+#define try_to_unmap(page, refs, container) SWAP_FAIL
+#define page_in_container(page, zone, container)  true


I spy a compile error.

The static-inline version looks nicer.




I will compile with the feature turned off and double check. I'll
also convert it to a static inline function.



 static inline int page_mkclean(struct page *page)
 {
diff -puN include/linux/swap.h~memctlr-reclaim-on-limit include/linux/swap.h
--- linux-2.6.20/include/linux/swap.h~memctlr-reclaim-on-limit  2007-02-18 
23:29:14.0 +0530
+++ linux-2.6.20-balbir/include/linux/swap.h2007-02-18 23:29:14.0 
+0530
@@ -188,6 +188,10 @@ extern void swap_setup(void);
 /* linux/mm/vmscan.c */
 extern unsigned long try_to_free_pages(struct zone **, gfp_t);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
+#ifdef CONFIG_CONTAINER_MEMCTLR
+extern unsigned long memctlr_shrink_mapped_memory(unsigned long nr_pages,
+   void *container);
+#endif


Usually one doesn't need to put ifdefs around the declaration like this. 
If the function doesn't exist and nobody calls it, we're fine.  If someone

_does_ call it, we'll find out the error at link-time.



Sure, sounds good. I'll get rid of the #ifdefs.

 
+/*

+ * checks if the mm's container and scan control passed container match, if
+ * so, is the container over it's limit. Returns 1 if the container is above
+ * its limit.
+ */
+int memctlr_mm_overlimit(struct mm_struct *mm, void *sc_cont)
+{
+   struct container *cont;
+   struct memctlr *mem;
+   long usage, limit;
+   int ret = 1;
+
+   if (!sc_cont)
+   goto out;
+
+   read_lock(&mm->container_lock);
+   cont = mm->container;
+
+   /*
+* Regular reclaim, let it proceed as usual
+*/
+   if (!sc_cont)
+   goto out;
+
+   ret = 0;
+   if (cont != sc_cont)
+   goto out;
+
+   mem = memctlr_from_cont(cont);
+   usage = atomic_long_read(&mem->counter.usage);
+   limit = atomic_long_read(&mem->counter.limit);
+   if (limit && (usage > limit))
+   ret = 1;
+out:
+   read_unlock(&mm->container_lock);
+   return ret;
+}


hm, I wonder how much additional lock traffic all this adds.



It's a read_lock() and most of the locks are read_locks
which allow for concurrent access, until the container
changes or goes away


 int memctlr_mm_init(struct mm_struct *mm)
 {
mm->counter = kmalloc(sizeof(struct res_counter), GFP_KERNEL);
@@ -77,6 +125,46 @@ void memctlr_mm_assign_container(struct 
 	write_unlock(&mm->container_lock);

 }
 
+static int memctlr_check_and_reclaim(struct container *cont, long usage,

+   long limit)
+{
+   unsigned long nr_pages = 0;
+   unsigned long nr_reclaimed = 0;
+   int retries = nr_retries;
+   int ret = 

Re: [RFC][PATCH][3/4] Add reclaim support

2007-02-19 Thread KAMEZAWA Hiroyuki
On Mon, 19 Feb 2007 12:20:42 +0530
Balbir Singh <[EMAIL PROTECTED]> wrote:

> +int memctlr_mm_overlimit(struct mm_struct *mm, void *sc_cont)
> +{
> + struct container *cont;
> + struct memctlr *mem;
> + long usage, limit;
> + int ret = 1;
> +
> + if (!sc_cont)
> + goto out;
> +
> + read_lock(&mm->container_lock);
> + cont = mm->container;

> +out:
> + read_unlock(&mm->container_lock);
> + return ret;
> +}
> +

should be
==
out_and_unlock:
read_unlock(&mm->container_lock);
out_:
return ret;


-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH][3/4] Add reclaim support

2007-02-19 Thread Andrew Morton
On Mon, 19 Feb 2007 12:20:42 +0530 Balbir Singh <[EMAIL PROTECTED]> wrote:

> 
> This patch reclaims pages from a container when the container limit is hit.
> The executable is oom'ed only when the container it is running in, is 
> overlimit
> and we could not reclaim any pages belonging to the container
> 
> A parameter called pushback, controls how much memory is reclaimed when the
> limit is hit. It should be easy to expose this knob to user space, but
> currently it is hard coded to 20% of the total limit of the container.
> 
> isolate_lru_pages() has been modified to isolate pages belonging to a
> particular container, so that reclaim code will reclaim only container
> pages. For shared pages, reclaim does not unmap all mappings of the page,
> it only unmaps those mappings that are over their limit. This ensures
> that other containers are not penalized while reclaiming shared pages.
> 
> Parallel reclaim per container is not allowed. Each controller has a wait
> queue that ensures that only one task per control is running reclaim on
> that container.
> 
> 
> ...
>
> --- linux-2.6.20/include/linux/rmap.h~memctlr-reclaim-on-limit
> 2007-02-18 23:29:14.0 +0530
> +++ linux-2.6.20-balbir/include/linux/rmap.h  2007-02-18 23:29:14.0 
> +0530
> @@ -90,7 +90,15 @@ static inline void page_dup_rmap(struct 
>   * Called from mm/vmscan.c to handle paging out
>   */
>  int page_referenced(struct page *, int is_locked);
> -int try_to_unmap(struct page *, int ignore_refs);
> +int try_to_unmap(struct page *, int ignore_refs, void *container);
> +#ifdef CONFIG_CONTAINER_MEMCTLR
> +bool page_in_container(struct page *page, struct zone *zone, void 
> *container);
> +#else
> +static inline bool page_in_container(struct page *page, struct zone *zone, 
> void *container)
> +{
> + return true;
> +}
> +#endif /* CONFIG_CONTAINER_MEMCTLR */
>  
>  /*
>   * Called from mm/filemap_xip.c to unmap empty zero page
> @@ -118,7 +126,8 @@ int page_mkclean(struct page *);
>  #define anon_vma_link(vma)   do {} while (0)
>  
>  #define page_referenced(page,l) TestClearPageReferenced(page)
> -#define try_to_unmap(page, refs) SWAP_FAIL
> +#define try_to_unmap(page, refs, container) SWAP_FAIL
> +#define page_in_container(page, zone, container)  true

I spy a compile error.

The static-inline version looks nicer.

>  static inline int page_mkclean(struct page *page)
>  {
> diff -puN include/linux/swap.h~memctlr-reclaim-on-limit include/linux/swap.h
> --- linux-2.6.20/include/linux/swap.h~memctlr-reclaim-on-limit
> 2007-02-18 23:29:14.0 +0530
> +++ linux-2.6.20-balbir/include/linux/swap.h  2007-02-18 23:29:14.0 
> +0530
> @@ -188,6 +188,10 @@ extern void swap_setup(void);
>  /* linux/mm/vmscan.c */
>  extern unsigned long try_to_free_pages(struct zone **, gfp_t);
>  extern unsigned long shrink_all_memory(unsigned long nr_pages);
> +#ifdef CONFIG_CONTAINER_MEMCTLR
> +extern unsigned long memctlr_shrink_mapped_memory(unsigned long nr_pages,
> + void *container);
> +#endif

Usually one doesn't need to put ifdefs around the declaration like this. 
If the function doesn't exist and nobody calls it, we're fine.  If someone
_does_ call it, we'll find out the error at link-time.

>  
> +/*
> + * checks if the mm's container and scan control passed container match, if
> + * so, is the container over it's limit. Returns 1 if the container is above
> + * its limit.
> + */
> +int memctlr_mm_overlimit(struct mm_struct *mm, void *sc_cont)
> +{
> + struct container *cont;
> + struct memctlr *mem;
> + long usage, limit;
> + int ret = 1;
> +
> + if (!sc_cont)
> + goto out;
> +
> + read_lock(&mm->container_lock);
> + cont = mm->container;
> +
> + /*
> +  * Regular reclaim, let it proceed as usual
> +  */
> + if (!sc_cont)
> + goto out;
> +
> + ret = 0;
> + if (cont != sc_cont)
> + goto out;
> +
> + mem = memctlr_from_cont(cont);
> + usage = atomic_long_read(&mem->counter.usage);
> + limit = atomic_long_read(&mem->counter.limit);
> + if (limit && (usage > limit))
> + ret = 1;
> +out:
> + read_unlock(&mm->container_lock);
> + return ret;
> +}

hm, I wonder how much additional lock traffic all this adds.

>  int memctlr_mm_init(struct mm_struct *mm)
>  {
>   mm->counter = kmalloc(sizeof(struct res_counter), GFP_KERNEL);
> @@ -77,6 +125,46 @@ void memctlr_mm_assign_container(struct 
>   write_unlock(&mm->container_lock);
>  }
>  
> +static int memctlr_check_and_reclaim(struct container *cont, long usage,
> + long limit)
> +{
> + unsigned long nr_pages = 0;
> + unsigned long nr_reclaimed = 0;
> + int retries = nr_retries;
> + int ret = 1;
> + struct memctlr *mem;
> +
> + mem = memctlr_from_cont(cont);
> + spin_lock(&mem->lock);
> + while ((retries-- > 0) && limit &&

[RFC][PATCH][3/4] Add reclaim support

2007-02-18 Thread Balbir Singh

This patch reclaims pages from a container when the container limit is hit.
The executable is oom'ed only when the container it is running in, is overlimit
and we could not reclaim any pages belonging to the container

A parameter called pushback, controls how much memory is reclaimed when the
limit is hit. It should be easy to expose this knob to user space, but
currently it is hard coded to 20% of the total limit of the container.

isolate_lru_pages() has been modified to isolate pages belonging to a
particular container, so that reclaim code will reclaim only container
pages. For shared pages, reclaim does not unmap all mappings of the page,
it only unmaps those mappings that are over their limit. This ensures
that other containers are not penalized while reclaiming shared pages.

Parallel reclaim per container is not allowed. Each controller has a wait
queue that ensures that only one task per control is running reclaim on
that container.


Signed-off-by: <[EMAIL PROTECTED]>
---

 include/linux/memctlr.h |8 ++
 include/linux/rmap.h|   13 +++-
 include/linux/swap.h|4 +
 mm/memctlr.c|  137 
 mm/migrate.c|2 
 mm/rmap.c   |   96 +++--
 mm/vmscan.c |   94 
 7 files changed, 324 insertions(+), 30 deletions(-)

diff -puN include/linux/memctlr.h~memctlr-reclaim-on-limit 
include/linux/memctlr.h
--- linux-2.6.20/include/linux/memctlr.h~memctlr-reclaim-on-limit   
2007-02-18 23:29:14.0 +0530
+++ linux-2.6.20-balbir/include/linux/memctlr.h 2007-02-18 23:29:14.0 
+0530
@@ -20,6 +20,7 @@ enum {
 };
 
 #ifdef CONFIG_CONTAINER_MEMCTLR
+#include 
 
 struct res_counter {
atomic_long_t usage;/* The current usage of the resource being */
@@ -33,6 +34,9 @@ extern void memctlr_mm_free(struct mm_st
 extern void memctlr_mm_assign_container(struct mm_struct *mm,
struct task_struct *p);
 extern int memctlr_update_rss(struct mm_struct *mm, int count, bool check);
+extern int memctlr_mm_overlimit(struct mm_struct *mm, void *sc_cont);
+extern wait_queue_head_t memctlr_reclaim_wq;
+extern bool memctlr_reclaim_in_progress;
 
 #else /* CONFIG_CONTAINER_MEMCTLR  */
 
@@ -56,5 +60,9 @@ static inline int memctlr_update_rss(str
return 0;
 }
 
+int memctlr_mm_overlimit(struct mm_struct *mm, void *sc_cont)
+{
+   return 0;
+}
 #endif /* CONFIG_CONTAINER_MEMCTLR */
 #endif /* _LINUX_MEMCTLR_H */
diff -puN include/linux/rmap.h~memctlr-reclaim-on-limit include/linux/rmap.h
--- linux-2.6.20/include/linux/rmap.h~memctlr-reclaim-on-limit  2007-02-18 
23:29:14.0 +0530
+++ linux-2.6.20-balbir/include/linux/rmap.h2007-02-18 23:29:14.0 
+0530
@@ -90,7 +90,15 @@ static inline void page_dup_rmap(struct 
  * Called from mm/vmscan.c to handle paging out
  */
 int page_referenced(struct page *, int is_locked);
-int try_to_unmap(struct page *, int ignore_refs);
+int try_to_unmap(struct page *, int ignore_refs, void *container);
+#ifdef CONFIG_CONTAINER_MEMCTLR
+bool page_in_container(struct page *page, struct zone *zone, void *container);
+#else
+static inline bool page_in_container(struct page *page, struct zone *zone, 
void *container)
+{
+   return true;
+}
+#endif /* CONFIG_CONTAINER_MEMCTLR */
 
 /*
  * Called from mm/filemap_xip.c to unmap empty zero page
@@ -118,7 +126,8 @@ int page_mkclean(struct page *);
 #define anon_vma_link(vma) do {} while (0)
 
 #define page_referenced(page,l) TestClearPageReferenced(page)
-#define try_to_unmap(page, refs) SWAP_FAIL
+#define try_to_unmap(page, refs, container) SWAP_FAIL
+#define page_in_container(page, zone, container)  true
 
 static inline int page_mkclean(struct page *page)
 {
diff -puN include/linux/swap.h~memctlr-reclaim-on-limit include/linux/swap.h
--- linux-2.6.20/include/linux/swap.h~memctlr-reclaim-on-limit  2007-02-18 
23:29:14.0 +0530
+++ linux-2.6.20-balbir/include/linux/swap.h2007-02-18 23:29:14.0 
+0530
@@ -188,6 +188,10 @@ extern void swap_setup(void);
 /* linux/mm/vmscan.c */
 extern unsigned long try_to_free_pages(struct zone **, gfp_t);
 extern unsigned long shrink_all_memory(unsigned long nr_pages);
+#ifdef CONFIG_CONTAINER_MEMCTLR
+extern unsigned long memctlr_shrink_mapped_memory(unsigned long nr_pages,
+   void *container);
+#endif
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
diff -puN mm/memctlr.c~memctlr-reclaim-on-limit mm/memctlr.c
--- linux-2.6.20/mm/memctlr.c~memctlr-reclaim-on-limit  2007-02-18 
23:29:14.0 +0530
+++ linux-2.6.20-balbir/mm/memctlr.c2007-02-18 23:34:51.0 +0530
@@ -17,16 +17,26 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
-#define RES_USAGE_NO_LIMIT 0
+#defin