Re: [-mm PATCH 4/8] Memory controller memory accounting (v2)

2007-07-10 Thread Balbir Singh
YAMAMOTO Takashi wrote:
>> On 7/10/07, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote:
>>> hi,
>>>
 diff -puN mm/memory.c~mem-control-accounting mm/memory.c
 --- linux-2.6.22-rc6/mm/memory.c~mem-control-accounting   2007-07-05 
 13:45:18.0 -0700
 +++ linux-2.6.22-rc6-balbir/mm/memory.c   2007-07-05 
 13:45:18.0 -0700
 @@ -1731,6 +1736,9 @@ gotten:
   cow_user_page(new_page, old_page, address, vma);
   }

 + if (mem_container_charge(new_page, mm))
 + goto oom;
 +
   /*
* Re-check the pte - we dropped the lock
*/
>>> it seems that the page will be leaked on error.
>> You mean meta_page right?
> 
> no.  i meant 'new_page'.
> 
> YAMAMOTO Takashi

Yes, I see. Thanks for clarifying.

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-mm PATCH 4/8] Memory controller memory accounting (v2)

2007-07-10 Thread YAMAMOTO Takashi
> On 7/10/07, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote:
> > hi,
> >
> > > diff -puN mm/memory.c~mem-control-accounting mm/memory.c
> > > --- linux-2.6.22-rc6/mm/memory.c~mem-control-accounting   2007-07-05 
> > > 13:45:18.0 -0700
> > > +++ linux-2.6.22-rc6-balbir/mm/memory.c   2007-07-05 
> > > 13:45:18.0 -0700
> >
> > > @@ -1731,6 +1736,9 @@ gotten:
> > >   cow_user_page(new_page, old_page, address, vma);
> > >   }
> > >
> > > + if (mem_container_charge(new_page, mm))
> > > + goto oom;
> > > +
> > >   /*
> > >* Re-check the pte - we dropped the lock
> > >*/
> >
> > it seems that the page will be leaked on error.
> 
> You mean meta_page right?

no.  i meant 'new_page'.

YAMAMOTO Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-mm PATCH 4/8] Memory controller memory accounting (v2)

2007-07-10 Thread Balbir Singh

On 7/10/07, YAMAMOTO Takashi <[EMAIL PROTECTED]> wrote:

hi,

> diff -puN mm/memory.c~mem-control-accounting mm/memory.c
> --- linux-2.6.22-rc6/mm/memory.c~mem-control-accounting   2007-07-05 
13:45:18.0 -0700
> +++ linux-2.6.22-rc6-balbir/mm/memory.c   2007-07-05 13:45:18.0 
-0700

> @@ -1731,6 +1736,9 @@ gotten:
>   cow_user_page(new_page, old_page, address, vma);
>   }
>
> + if (mem_container_charge(new_page, mm))
> + goto oom;
> +
>   /*
>* Re-check the pte - we dropped the lock
>*/

it seems that the page will be leaked on error.


You mean meta_page right?



> @@ -2188,6 +2196,11 @@ static int do_swap_page(struct mm_struct
>   }
>
>   delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
> + if (mem_container_charge(page, mm)) {
> + ret = VM_FAULT_OOM;
> + goto out;
> + }
> +
>   mark_page_accessed(page);
>   lock_page(page);
>

ditto.

> @@ -2264,6 +2278,9 @@ static int do_anonymous_page(struct mm_s
>   if (!page)
>   goto oom;
>
> + if (mem_container_charge(page, mm))
> + goto oom;
> +
>   entry = mk_pte(page, vma->vm_page_prot);
>   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>

ditto.

can you check the rest of the patch by yourself?  thanks.



Excellent catch! I'll review the accounting framework and post the
updated version soon

Balbir
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-mm PATCH 4/8] Memory controller memory accounting (v2)

2007-07-10 Thread YAMAMOTO Takashi
hi,

> diff -puN mm/memory.c~mem-control-accounting mm/memory.c
> --- linux-2.6.22-rc6/mm/memory.c~mem-control-accounting   2007-07-05 
> 13:45:18.0 -0700
> +++ linux-2.6.22-rc6-balbir/mm/memory.c   2007-07-05 13:45:18.0 
> -0700

> @@ -1731,6 +1736,9 @@ gotten:
>   cow_user_page(new_page, old_page, address, vma);
>   }
>  
> + if (mem_container_charge(new_page, mm))
> + goto oom;
> +
>   /*
>* Re-check the pte - we dropped the lock
>*/

it seems that the page will be leaked on error.

> @@ -2188,6 +2196,11 @@ static int do_swap_page(struct mm_struct
>   }
>  
>   delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
> + if (mem_container_charge(page, mm)) {
> + ret = VM_FAULT_OOM;
> + goto out;
> + }
> +
>   mark_page_accessed(page);
>   lock_page(page);
>  

ditto.

> @@ -2264,6 +2278,9 @@ static int do_anonymous_page(struct mm_s
>   if (!page)
>   goto oom;
>  
> + if (mem_container_charge(page, mm))
> + goto oom;
> +
>   entry = mk_pte(page, vma->vm_page_prot);
>   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
>  

ditto.

can you check the rest of the patch by yourself?  thanks.

YAMAMOTO Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-mm PATCH 4/8] Memory controller memory accounting (v2)

2007-07-10 Thread YAMAMOTO Takashi
hi,

 diff -puN mm/memory.c~mem-control-accounting mm/memory.c
 --- linux-2.6.22-rc6/mm/memory.c~mem-control-accounting   2007-07-05 
 13:45:18.0 -0700
 +++ linux-2.6.22-rc6-balbir/mm/memory.c   2007-07-05 13:45:18.0 
 -0700

 @@ -1731,6 +1736,9 @@ gotten:
   cow_user_page(new_page, old_page, address, vma);
   }
  
 + if (mem_container_charge(new_page, mm))
 + goto oom;
 +
   /*
* Re-check the pte - we dropped the lock
*/

it seems that the page will be leaked on error.

 @@ -2188,6 +2196,11 @@ static int do_swap_page(struct mm_struct
   }
  
   delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
 + if (mem_container_charge(page, mm)) {
 + ret = VM_FAULT_OOM;
 + goto out;
 + }
 +
   mark_page_accessed(page);
   lock_page(page);
  

ditto.

 @@ -2264,6 +2278,9 @@ static int do_anonymous_page(struct mm_s
   if (!page)
   goto oom;
  
 + if (mem_container_charge(page, mm))
 + goto oom;
 +
   entry = mk_pte(page, vma-vm_page_prot);
   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
  

ditto.

can you check the rest of the patch by yourself?  thanks.

YAMAMOTO Takashi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-mm PATCH 4/8] Memory controller memory accounting (v2)

2007-07-10 Thread Balbir Singh

On 7/10/07, YAMAMOTO Takashi [EMAIL PROTECTED] wrote:

hi,

 diff -puN mm/memory.c~mem-control-accounting mm/memory.c
 --- linux-2.6.22-rc6/mm/memory.c~mem-control-accounting   2007-07-05 
13:45:18.0 -0700
 +++ linux-2.6.22-rc6-balbir/mm/memory.c   2007-07-05 13:45:18.0 
-0700

 @@ -1731,6 +1736,9 @@ gotten:
   cow_user_page(new_page, old_page, address, vma);
   }

 + if (mem_container_charge(new_page, mm))
 + goto oom;
 +
   /*
* Re-check the pte - we dropped the lock
*/

it seems that the page will be leaked on error.


You mean meta_page right?



 @@ -2188,6 +2196,11 @@ static int do_swap_page(struct mm_struct
   }

   delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
 + if (mem_container_charge(page, mm)) {
 + ret = VM_FAULT_OOM;
 + goto out;
 + }
 +
   mark_page_accessed(page);
   lock_page(page);


ditto.

 @@ -2264,6 +2278,9 @@ static int do_anonymous_page(struct mm_s
   if (!page)
   goto oom;

 + if (mem_container_charge(page, mm))
 + goto oom;
 +
   entry = mk_pte(page, vma-vm_page_prot);
   entry = maybe_mkwrite(pte_mkdirty(entry), vma);


ditto.

can you check the rest of the patch by yourself?  thanks.



Excellent catch! I'll review the accounting framework and post the
updated version soon

Balbir
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-mm PATCH 4/8] Memory controller memory accounting (v2)

2007-07-10 Thread YAMAMOTO Takashi
 On 7/10/07, YAMAMOTO Takashi [EMAIL PROTECTED] wrote:
  hi,
 
   diff -puN mm/memory.c~mem-control-accounting mm/memory.c
   --- linux-2.6.22-rc6/mm/memory.c~mem-control-accounting   2007-07-05 
   13:45:18.0 -0700
   +++ linux-2.6.22-rc6-balbir/mm/memory.c   2007-07-05 
   13:45:18.0 -0700
 
   @@ -1731,6 +1736,9 @@ gotten:
 cow_user_page(new_page, old_page, address, vma);
 }
  
   + if (mem_container_charge(new_page, mm))
   + goto oom;
   +
 /*
  * Re-check the pte - we dropped the lock
  */
 
  it seems that the page will be leaked on error.
 
 You mean meta_page right?

no.  i meant 'new_page'.

YAMAMOTO Takashi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [-mm PATCH 4/8] Memory controller memory accounting (v2)

2007-07-10 Thread Balbir Singh
YAMAMOTO Takashi wrote:
 On 7/10/07, YAMAMOTO Takashi [EMAIL PROTECTED] wrote:
 hi,

 diff -puN mm/memory.c~mem-control-accounting mm/memory.c
 --- linux-2.6.22-rc6/mm/memory.c~mem-control-accounting   2007-07-05 
 13:45:18.0 -0700
 +++ linux-2.6.22-rc6-balbir/mm/memory.c   2007-07-05 
 13:45:18.0 -0700
 @@ -1731,6 +1736,9 @@ gotten:
   cow_user_page(new_page, old_page, address, vma);
   }

 + if (mem_container_charge(new_page, mm))
 + goto oom;
 +
   /*
* Re-check the pte - we dropped the lock
*/
 it seems that the page will be leaked on error.
 You mean meta_page right?
 
 no.  i meant 'new_page'.
 
 YAMAMOTO Takashi

Yes, I see. Thanks for clarifying.

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[-mm PATCH 4/8] Memory controller memory accounting (v2)

2007-07-05 Thread Balbir Singh

Add the accounting hooks. The accounting is carried out for RSS and Page
Cache (unmapped) pages. There is now a common limit and accounting for both.
The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
time. Page cache is accounted at add_to_page_cache(),
__delete_from_page_cache(). Swap cache is also accounted for.

Each page's meta_page is protected with a bit in page flags, this makes
handling of race conditions involving simultaneous mappings of a page easier.
A reference count is kept in the meta_page to deal with cases where a page
might be unmapped from the RSS of all tasks, but still lives in the page
cache.

Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---

 fs/exec.c  |1 
 include/linux/memcontrol.h |   11 +++
 include/linux/page-flags.h |3 +
 mm/filemap.c   |8 ++
 mm/memcontrol.c|  132 -
 mm/memory.c|   22 +++
 mm/migrate.c   |6 ++
 mm/page_alloc.c|3 +
 mm/rmap.c  |2 
 mm/swap_state.c|8 ++
 mm/swapfile.c  |   40 +++--
 11 files changed, 218 insertions(+), 18 deletions(-)

diff -puN fs/exec.c~mem-control-accounting fs/exec.c
--- linux-2.6.22-rc6/fs/exec.c~mem-control-accounting   2007-07-05 
13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/fs/exec.c   2007-07-05 13:45:18.0 -0700
@@ -51,6 +51,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
diff -puN include/linux/memcontrol.h~mem-control-accounting 
include/linux/memcontrol.h
--- linux-2.6.22-rc6/include/linux/memcontrol.h~mem-control-accounting  
2007-07-05 13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/memcontrol.h  2007-07-05 
18:27:26.0 -0700
@@ -24,6 +24,8 @@ extern void mm_init_container(struct mm_
 extern void mm_free_container(struct mm_struct *mm);
 extern void page_assign_meta_page(struct page *page, struct meta_page *mp);
 extern struct meta_page *page_get_meta_page(struct page *page);
+extern int mem_container_charge(struct page *page, struct mm_struct *mm);
+extern void mem_container_uncharge(struct meta_page *mp);
 
 #else /* CONFIG_CONTAINER_MEM_CONT */
 static inline void mm_init_container(struct mm_struct *mm,
@@ -45,6 +47,15 @@ static inline struct meta_page *page_get
return NULL;
 }
 
+static inline int mem_container_charge(struct page *page, struct mm_struct *mm)
+{
+   return 0;
+}
+
+static inline void mem_container_uncharge(struct meta_page *mp)
+{
+}
+
 #endif /* CONFIG_CONTAINER_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff -puN include/linux/page-flags.h~mem-control-accounting 
include/linux/page-flags.h
--- linux-2.6.22-rc6/include/linux/page-flags.h~mem-control-accounting  
2007-07-05 13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/page-flags.h  2007-07-05 
13:45:18.0 -0700
@@ -98,6 +98,9 @@
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 #define PG_pinned  PG_owner_priv_1 /* Xen pinned pagetable */
 
+#define PG_metapage21  /* Used for checking if a meta_page */
+   /* is associated with a page*/
+
 #if (BITS_PER_LONG > 32)
 /*
  * 64-bit-only flags build down from bit 31
diff -puN mm/filemap.c~mem-control-accounting mm/filemap.c
--- linux-2.6.22-rc6/mm/filemap.c~mem-control-accounting2007-07-05 
13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/filemap.c2007-07-05 18:26:29.0 
-0700
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include  /* for BUG_ON(!in_atomic()) only */
+#include 
 #include "internal.h"
 
 /*
@@ -116,6 +117,7 @@ void __remove_from_page_cache(struct pag
 {
struct address_space *mapping = page->mapping;
 
+   mem_container_uncharge(page_get_meta_page(page));
radix_tree_delete(>page_tree, page->index);
page->mapping = NULL;
mapping->nrpages--;
@@ -442,6 +444,11 @@ int add_to_page_cache(struct page *page,
int error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
 
if (error == 0) {
+
+   error = mem_container_charge(page, current->mm);
+   if (error)
+   goto out;
+
write_lock_irq(>tree_lock);
error = radix_tree_insert(>page_tree, offset, page);
if (!error) {
@@ -455,6 +462,7 @@ int add_to_page_cache(struct page *page,
write_unlock_irq(>tree_lock);
radix_tree_preload_end();
}
+out:
return error;
 }
 EXPORT_SYMBOL(add_to_page_cache);
diff -puN mm/memcontrol.c~mem-control-accounting mm/memcontrol.c
--- linux-2.6.22-rc6/mm/memcontrol.c~mem-control-accounting 2007-07-05 
13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/memcontrol.c 2007-07-05 18:27:29.0 
-0700
@@ -16,6 +16,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 

[-mm PATCH 4/8] Memory controller memory accounting (v2)

2007-07-05 Thread Balbir Singh

Add the accounting hooks. The accounting is carried out for RSS and Page
Cache (unmapped) pages. There is now a common limit and accounting for both.
The RSS accounting is accounted at page_add_*_rmap() and page_remove_rmap()
time. Page cache is accounted at add_to_page_cache(),
__delete_from_page_cache(). Swap cache is also accounted for.

Each page's meta_page is protected with a bit in page flags, this makes
handling of race conditions involving simultaneous mappings of a page easier.
A reference count is kept in the meta_page to deal with cases where a page
might be unmapped from the RSS of all tasks, but still lives in the page
cache.

Signed-off-by: Balbir Singh [EMAIL PROTECTED]
---

 fs/exec.c  |1 
 include/linux/memcontrol.h |   11 +++
 include/linux/page-flags.h |3 +
 mm/filemap.c   |8 ++
 mm/memcontrol.c|  132 -
 mm/memory.c|   22 +++
 mm/migrate.c   |6 ++
 mm/page_alloc.c|3 +
 mm/rmap.c  |2 
 mm/swap_state.c|8 ++
 mm/swapfile.c  |   40 +++--
 11 files changed, 218 insertions(+), 18 deletions(-)

diff -puN fs/exec.c~mem-control-accounting fs/exec.c
--- linux-2.6.22-rc6/fs/exec.c~mem-control-accounting   2007-07-05 
13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/fs/exec.c   2007-07-05 13:45:18.0 -0700
@@ -51,6 +51,7 @@
 #include linux/cn_proc.h
 #include linux/audit.h
 #include linux/signalfd.h
+#include linux/memcontrol.h
 
 #include asm/uaccess.h
 #include asm/mmu_context.h
diff -puN include/linux/memcontrol.h~mem-control-accounting 
include/linux/memcontrol.h
--- linux-2.6.22-rc6/include/linux/memcontrol.h~mem-control-accounting  
2007-07-05 13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/memcontrol.h  2007-07-05 
18:27:26.0 -0700
@@ -24,6 +24,8 @@ extern void mm_init_container(struct mm_
 extern void mm_free_container(struct mm_struct *mm);
 extern void page_assign_meta_page(struct page *page, struct meta_page *mp);
 extern struct meta_page *page_get_meta_page(struct page *page);
+extern int mem_container_charge(struct page *page, struct mm_struct *mm);
+extern void mem_container_uncharge(struct meta_page *mp);
 
 #else /* CONFIG_CONTAINER_MEM_CONT */
 static inline void mm_init_container(struct mm_struct *mm,
@@ -45,6 +47,15 @@ static inline struct meta_page *page_get
return NULL;
 }
 
+static inline int mem_container_charge(struct page *page, struct mm_struct *mm)
+{
+   return 0;
+}
+
+static inline void mem_container_uncharge(struct meta_page *mp)
+{
+}
+
 #endif /* CONFIG_CONTAINER_MEM_CONT */
 
 #endif /* _LINUX_MEMCONTROL_H */
diff -puN include/linux/page-flags.h~mem-control-accounting 
include/linux/page-flags.h
--- linux-2.6.22-rc6/include/linux/page-flags.h~mem-control-accounting  
2007-07-05 13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/include/linux/page-flags.h  2007-07-05 
13:45:18.0 -0700
@@ -98,6 +98,9 @@
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 #define PG_pinned  PG_owner_priv_1 /* Xen pinned pagetable */
 
+#define PG_metapage21  /* Used for checking if a meta_page */
+   /* is associated with a page*/
+
 #if (BITS_PER_LONG  32)
 /*
  * 64-bit-only flags build down from bit 31
diff -puN mm/filemap.c~mem-control-accounting mm/filemap.c
--- linux-2.6.22-rc6/mm/filemap.c~mem-control-accounting2007-07-05 
13:45:18.0 -0700
+++ linux-2.6.22-rc6-balbir/mm/filemap.c2007-07-05 18:26:29.0 
-0700
@@ -31,6 +31,7 @@
 #include linux/syscalls.h
 #include linux/cpuset.h
 #include linux/hardirq.h /* for BUG_ON(!in_atomic()) only */
+#include linux/memcontrol.h
 #include internal.h
 
 /*
@@ -116,6 +117,7 @@ void __remove_from_page_cache(struct pag
 {
struct address_space *mapping = page-mapping;
 
+   mem_container_uncharge(page_get_meta_page(page));
radix_tree_delete(mapping-page_tree, page-index);
page-mapping = NULL;
mapping-nrpages--;
@@ -442,6 +444,11 @@ int add_to_page_cache(struct page *page,
int error = radix_tree_preload(gfp_mask  ~__GFP_HIGHMEM);
 
if (error == 0) {
+
+   error = mem_container_charge(page, current-mm);
+   if (error)
+   goto out;
+
write_lock_irq(mapping-tree_lock);
error = radix_tree_insert(mapping-page_tree, offset, page);
if (!error) {
@@ -455,6 +462,7 @@ int add_to_page_cache(struct page *page,
write_unlock_irq(mapping-tree_lock);
radix_tree_preload_end();
}
+out:
return error;
 }
 EXPORT_SYMBOL(add_to_page_cache);
diff -puN mm/memcontrol.c~mem-control-accounting mm/memcontrol.c
--- linux-2.6.22-rc6/mm/memcontrol.c~mem-control-accounting 2007-07-05