Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 12:33:28 +0900 (JST)
[EMAIL PROTECTED] (YAMAMOTO Takashi) wrote:

> > +static inline struct mem_cgroup_per_zone *
> > +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> > +{
> > +   if (!mem->info.nodeinfo[nid])
> 
> can this be true?
> 
> YAMAMOTO Takashi

When I set early_init=1, I added that check.
BUG_ON() is better ?

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread YAMAMOTO Takashi
> +static inline struct mem_cgroup_per_zone *
> +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
> +{
> + if (!mem->info.nodeinfo[nid])

can this be true?

YAMAMOTO Takashi

> + return NULL;
> + return >info.nodeinfo[nid]->zoneinfo[zid];
> +}
> +
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread Christoph Lameter
On Thu, 29 Nov 2007, KAMEZAWA Hiroyuki wrote:

> ok, just use N_HIGH_MEMORY here and add comment for hotplugging support is 
> not yet.
> 
> Christoph-san, Lee-san, could you confirm following ?
> 
> - when SLAB is used, kmalloc_node() against offline node will success.
> - when SLUB is used, kmalloc_node() against offline node will panic.
> 
> Then, the caller should take care that node is online before kmalloc().

H... An offline node implies that the per node structure does not 
exist. SLAB should fail too. If there is something wrong with the allocs 
then its likely a difference in the way hotplug was put into SLAB and 
SLUB.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 12:19:37 +0900 (JST)
[EMAIL PROTECTED] (YAMAMOTO Takashi) wrote:

> > @@ -651,10 +758,11 @@
> > /* Avoid race with charge */
> > atomic_set(>ref_cnt, 0);
> > if (clear_page_cgroup(page, pc) == pc) {
> > +   int active;
> > css_put(>css);
> > +   active = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
> > res_counter_uncharge(>res, PAGE_SIZE);
> > -   list_del_init(>lru);
> > -   mem_cgroup_charge_statistics(mem, pc->flags, false);
> > +   __mem_cgroup_remove_list(pc);
> > kfree(pc);
> > } else  /* being uncharged ? ...do relax */
> > break;
> 
> 'active' seems unused.
> 
ok, I will post clean-up against -mm2.

Thanks,
-Kame

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread YAMAMOTO Takashi
> @@ -651,10 +758,11 @@
>   /* Avoid race with charge */
>   atomic_set(>ref_cnt, 0);
>   if (clear_page_cgroup(page, pc) == pc) {
> + int active;
>   css_put(>css);
> + active = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
>   res_counter_uncharge(>res, PAGE_SIZE);
> - list_del_init(>lru);
> - mem_cgroup_charge_statistics(mem, pc->flags, false);
> + __mem_cgroup_remove_list(pc);
>   kfree(pc);
>   } else  /* being uncharged ? ...do relax */
>   break;

'active' seems unused.

YAMAMOTO Takashi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 11:24:06 +0900
KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> On Thu, 29 Nov 2007 10:37:02 +0900
> KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:
> 
> > Maybe zonelists of NODE_DATA() is not initialized. you are right.
> > I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug 
> > case later.)
> > 
> > Thank you for test!
> > 
> Could you try this ? 
> 
Sorry..this can be a workaround but I noticed I miss something..

ok, just use N_HIGH_MEMORY here and add comment for hotplugging support is not 
yet.

Christoph-san, Lee-san, could you confirm following ?

- when SLAB is used, kmalloc_node() against offline node will success.
- when SLUB is used, kmalloc_node() against offline node will panic.

Then, the caller should take care that node is online before kmalloc().

Regards,
-Kame 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 10:37:02 +0900
KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote:

> Maybe zonelists of NODE_DATA() is not initialized. you are right.
> I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug 
> case later.)
> 
> Thank you for test!
> 
Could you try this ? 

Thanks,
-Kame
==

Don't call kmalloc() against possible but offline node.

Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>

 mm/memcontrol.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

Index: test-2.6.24-rc3-mm1/mm/memcontrol.c
===
--- test-2.6.24-rc3-mm1.orig/mm/memcontrol.c
+++ test-2.6.24-rc3-mm1/mm/memcontrol.c
@@ -1117,8 +1117,14 @@ static int alloc_mem_cgroup_per_zone_inf
struct mem_cgroup_per_node *pn;
struct mem_cgroup_per_zone *mz;
int zone;
-
-   pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
+   /*
+* This routine is called against possible nodes.
+* But it's BUG to call kmalloc() against offline node.
+*/
+   if (node_state(N_ONLINE, node))
+   pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
+   else
+   pn = kmalloc(sizeof(*pn), GFP_KERNEL);
if (!pn)
return 1;
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Wed, 28 Nov 2007 16:19:59 -0500
Lee Schermerhorn <[EMAIL PROTECTED]> wrote:

> As soon as this loop hits the first non-existent node on my platform, I
> get a NULL pointer deref down in __alloc_pages.  Stack trace below.
> 
> Perhaps N_POSSIBLE should be N_HIGH_MEMORY?  That would require handling
> of memory/node hotplug for each memory control group, right?  But, I'm
> going to try N_HIGH_MEMORY as a work around.
> 
Hmm, ok. (>_<


> Call Trace:
>  [] show_stack+0x80/0xa0
> sp=a001008e39c0 bsp=a001008dd1b0
>  [] show_regs+0x870/0x8a0
> sp=a001008e3b90 bsp=a001008dd158
>  [] die+0x190/0x300
> sp=a001008e3b90 bsp=a001008dd110
>  [] ia64_do_page_fault+0x8e0/0xa20
> sp=a001008e3b90 bsp=a001008dd0b8
>  [] ia64_leave_kernel+0x0/0x270
> sp=a001008e3c20 bsp=a001008dd0b8
>  [] __alloc_pages+0x30/0x6e0
> sp=a001008e3df0 bsp=a001008dcfe0
>  [] new_slab+0x610/0x6c0
> sp=a001008e3e00 bsp=a001008dcf80
>  [] get_new_slab+0x50/0x200
> sp=a001008e3e00 bsp=a001008dcf48
>  [] __slab_alloc+0x2e0/0x4e0
> sp=a001008e3e00 bsp=a001008dcf00
>  [] kmem_cache_alloc_node+0x180/0x200
> sp=a001008e3e10 bsp=a001008dcec0
>  [] mem_cgroup_create+0x160/0x400
> sp=a001008e3e10 bsp=a001008dce78
>  [] cgroup_init_subsys+0xa0/0x400
> sp=a001008e3e20 bsp=a001008dce28
>  [] cgroup_init+0x90/0x160
> sp=a001008e3e20 bsp=a001008dce00
>  [] start_kernel+0x700/0x820
> sp=a001008e3e20 bsp=a001008dcd80
> 
Maybe zonelists of NODE_DATA() is not initialized. you are right.
I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug case 
later.)

Thank you for test!

Regards,
-Kame



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread Lee Schermerhorn
Just a "heads up":  This patch is the apparent cause of a boot time
panic--null pointer deref--on my numa platform.  See below.

On Tue, 2007-11-27 at 12:00 +0900, KAMEZAWA Hiroyuki wrote:
> Counting active/inactive per-zone in memory controller.
> 
> This patch adds per-zone status in memory cgroup.
> These values are often read (as per-zone value) by page reclaiming.
> 
> In current design, per-zone stat is just a unsigned long value and 
> not an atomic value because they are modified only under lru_lock.
> (So, atomic_ops is not necessary.)
> 
> This patch adds ACTIVE and INACTIVE per-zone status values.
> 
> For handling per-zone status, this patch adds
>   struct mem_cgroup_per_zone {
>   ...
>   }
> and some helper functions. This will be useful to add per-zone objects
> in mem_cgroup.
> 
> This patch turns memory controller's early_init to be 0 for calling 
> kmalloc() in initialization.
> 
> Changelog V2 -> V3
>   - fixed comments.
> 
> Changelog V1 -> V2
>   - added mem_cgroup_per_zone struct.
>   This will help following patches to implement per-zone objects and
>   pack them into a struct.
>   - added __mem_cgroup_add_list() and __mem_cgroup_remove_list()
>   - fixed page migration handling.
>   - renamed zstat to info (per-zone-info)
> This will be place for per-zone information(lru, lock, ..)
>   - use page_cgroup_nid()/zid() funcs.
> 
> Acked-by: Balbir Singh <[EMAIL PROTECTED]>
> Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>
> 
> 
>  mm/memcontrol.c |  164 
> +---
>  1 file changed, 157 insertions(+), 7 deletions(-)
> 
> Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c
> ===
> --- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c 2007-11-26 16:39:00.0 
> +0900
> +++ linux-2.6.24-rc3-mm1/mm/memcontrol.c  2007-11-26 16:39:02.0 
> +0900
> @@ -78,6 +78,31 @@

>  
> +static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
> +{
> + struct mem_cgroup_per_node *pn;
> +
> + pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
> + if (!pn)
> + return 1;
> + mem->info.nodeinfo[node] = pn;
> + memset(pn, 0, sizeof(*pn));
> + return 0;
> +}
> +
>  static struct mem_cgroup init_mem_cgroup;
>  
>  static struct cgroup_subsys_state *
>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  {
>   struct mem_cgroup *mem;
> + int node;
>  
>   if (unlikely((cont->parent) == NULL)) {
>   mem = _mem_cgroup;
> @@ -907,7 +1039,19 @@
>   INIT_LIST_HEAD(>inactive_list);
>   spin_lock_init(>lru_lock);
>   mem->control_type = MEM_CGROUP_TYPE_ALL;
> + memset(>info, 0, sizeof(mem->info));
> +
> + for_each_node_state(node, N_POSSIBLE)
> + if (alloc_mem_cgroup_per_zone_info(mem, node))
> + goto free_out;
> +

As soon as this loop hits the first non-existent node on my platform, I
get a NULL pointer deref down in __alloc_pages.  Stack trace below.

Perhaps N_POSSIBLE should be N_HIGH_MEMORY?  That would require handling
of memory/node hotplug for each memory control group, right?  But, I'm
going to try N_HIGH_MEMORY as a work around.

Lee
>   return >css;
> +free_out:
> + for_each_node_state(node, N_POSSIBLE)
> + kfree(mem->info.nodeinfo[node]);
> + if (cont->parent != NULL)
> + kfree(mem);
> + return NULL;
>  }
>  
>  static void mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
> @@ -920,6 +1064,12 @@
>  static void mem_cgroup_destroy(struct cgroup_subsys *ss,
>   struct cgroup *cont)
>  {
> + int node;
> + struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
> +
> + for_each_node_state(node, N_POSSIBLE)
> + kfree(mem->info.nodeinfo[node]);
> +
>   kfree(mem_cgroup_from_cont(cont));
>  }
>  
> @@ -972,5 +1122,5 @@
>   .destroy = mem_cgroup_destroy,
>   .populate = mem_cgroup_populate,
>   .attach = mem_cgroup_move_task,
> - .early_init = 1,
> + .early_init = 0,
>  };

Initializing cgroup subsys memory
Unable to handle kernel NULL pointer dereference (address 3c80)
swapper[0]: Oops 11012296146944 [1]
Modules linked in:

Pid: 0, CPU 0, comm:  swapper
psr : 1210084a6010 ifs : 8b1a ip  : []Not 
tainted
ip is at __alloc_pages+0x31/0x6e0
unat:  pfs : 060f rsc : 0003
rnat: a001009db3b8 bsps: a001009e0490 pr  : 656960155aa65659
ldrs:  ccv :  fpsr: 0009804c8a70433f
csd :  ssd : 
b0  : a00100187370 b6  : a00100194440 b7  : a0010086d560
f6  : 1003e f7  : 1003e0055
f8  : 1003e00c0 f9  : 1003e3fc0
f10 : 1003e00c0 f11 : 1003e0055
r1  : a00100bc0f10 r2  : ffe6 r3  : 

Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread Lee Schermerhorn
Just a heads up:  This patch is the apparent cause of a boot time
panic--null pointer deref--on my numa platform.  See below.

On Tue, 2007-11-27 at 12:00 +0900, KAMEZAWA Hiroyuki wrote:
 Counting active/inactive per-zone in memory controller.
 
 This patch adds per-zone status in memory cgroup.
 These values are often read (as per-zone value) by page reclaiming.
 
 In current design, per-zone stat is just a unsigned long value and 
 not an atomic value because they are modified only under lru_lock.
 (So, atomic_ops is not necessary.)
 
 This patch adds ACTIVE and INACTIVE per-zone status values.
 
 For handling per-zone status, this patch adds
   struct mem_cgroup_per_zone {
   ...
   }
 and some helper functions. This will be useful to add per-zone objects
 in mem_cgroup.
 
 This patch turns memory controller's early_init to be 0 for calling 
 kmalloc() in initialization.
 
 Changelog V2 - V3
   - fixed comments.
 
 Changelog V1 - V2
   - added mem_cgroup_per_zone struct.
   This will help following patches to implement per-zone objects and
   pack them into a struct.
   - added __mem_cgroup_add_list() and __mem_cgroup_remove_list()
   - fixed page migration handling.
   - renamed zstat to info (per-zone-info)
 This will be place for per-zone information(lru, lock, ..)
   - use page_cgroup_nid()/zid() funcs.
 
 Acked-by: Balbir Singh [EMAIL PROTECTED]
 Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED]
 
 
  mm/memcontrol.c |  164 
 +---
  1 file changed, 157 insertions(+), 7 deletions(-)
 
 Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c
 ===
 --- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c 2007-11-26 16:39:00.0 
 +0900
 +++ linux-2.6.24-rc3-mm1/mm/memcontrol.c  2007-11-26 16:39:02.0 
 +0900
 @@ -78,6 +78,31 @@
snip
  
 +static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
 +{
 + struct mem_cgroup_per_node *pn;
 +
 + pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
 + if (!pn)
 + return 1;
 + mem-info.nodeinfo[node] = pn;
 + memset(pn, 0, sizeof(*pn));
 + return 0;
 +}
 +
  static struct mem_cgroup init_mem_cgroup;
  
  static struct cgroup_subsys_state *
  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
  {
   struct mem_cgroup *mem;
 + int node;
  
   if (unlikely((cont-parent) == NULL)) {
   mem = init_mem_cgroup;
 @@ -907,7 +1039,19 @@
   INIT_LIST_HEAD(mem-inactive_list);
   spin_lock_init(mem-lru_lock);
   mem-control_type = MEM_CGROUP_TYPE_ALL;
 + memset(mem-info, 0, sizeof(mem-info));
 +
 + for_each_node_state(node, N_POSSIBLE)
 + if (alloc_mem_cgroup_per_zone_info(mem, node))
 + goto free_out;
 +

As soon as this loop hits the first non-existent node on my platform, I
get a NULL pointer deref down in __alloc_pages.  Stack trace below.

Perhaps N_POSSIBLE should be N_HIGH_MEMORY?  That would require handling
of memory/node hotplug for each memory control group, right?  But, I'm
going to try N_HIGH_MEMORY as a work around.

Lee
   return mem-css;
 +free_out:
 + for_each_node_state(node, N_POSSIBLE)
 + kfree(mem-info.nodeinfo[node]);
 + if (cont-parent != NULL)
 + kfree(mem);
 + return NULL;
  }
  
  static void mem_cgroup_pre_destroy(struct cgroup_subsys *ss,
 @@ -920,6 +1064,12 @@
  static void mem_cgroup_destroy(struct cgroup_subsys *ss,
   struct cgroup *cont)
  {
 + int node;
 + struct mem_cgroup *mem = mem_cgroup_from_cont(cont);
 +
 + for_each_node_state(node, N_POSSIBLE)
 + kfree(mem-info.nodeinfo[node]);
 +
   kfree(mem_cgroup_from_cont(cont));
  }
  
 @@ -972,5 +1122,5 @@
   .destroy = mem_cgroup_destroy,
   .populate = mem_cgroup_populate,
   .attach = mem_cgroup_move_task,
 - .early_init = 1,
 + .early_init = 0,
  };

Initializing cgroup subsys memory
Unable to handle kernel NULL pointer dereference (address 3c80)
swapper[0]: Oops 11012296146944 [1]
Modules linked in:

Pid: 0, CPU 0, comm:  swapper
psr : 1210084a6010 ifs : 8b1a ip  : [a00100132e11]Not 
tainted
ip is at __alloc_pages+0x31/0x6e0
unat:  pfs : 060f rsc : 0003
rnat: a001009db3b8 bsps: a001009e0490 pr  : 656960155aa65659
ldrs:  ccv :  fpsr: 0009804c8a70433f
csd :  ssd : 
b0  : a00100187370 b6  : a00100194440 b7  : a0010086d560
f6  : 1003e f7  : 1003e0055
f8  : 1003e00c0 f9  : 1003e3fc0
f10 : 1003e00c0 f11 : 1003e0055
r1  : a00100bc0f10 r2  : ffe6 r3  : 0002
r8  : 00071ef0 r9  : 0005 r10 : e7002034d588
r11 : 

Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Wed, 28 Nov 2007 16:19:59 -0500
Lee Schermerhorn [EMAIL PROTECTED] wrote:

 As soon as this loop hits the first non-existent node on my platform, I
 get a NULL pointer deref down in __alloc_pages.  Stack trace below.
 
 Perhaps N_POSSIBLE should be N_HIGH_MEMORY?  That would require handling
 of memory/node hotplug for each memory control group, right?  But, I'm
 going to try N_HIGH_MEMORY as a work around.
 
Hmm, ok. (_


 Call Trace:
  [a00100014de0] show_stack+0x80/0xa0
 sp=a001008e39c0 bsp=a001008dd1b0
  [a00100015a70] show_regs+0x870/0x8a0
 sp=a001008e3b90 bsp=a001008dd158
  [a0010003d130] die+0x190/0x300
 sp=a001008e3b90 bsp=a001008dd110
  [a00100071b80] ia64_do_page_fault+0x8e0/0xa20
 sp=a001008e3b90 bsp=a001008dd0b8
  [a001b5c0] ia64_leave_kernel+0x0/0x270
 sp=a001008e3c20 bsp=a001008dd0b8
  [a00100132e10] __alloc_pages+0x30/0x6e0
 sp=a001008e3df0 bsp=a001008dcfe0
  [a00100187370] new_slab+0x610/0x6c0
 sp=a001008e3e00 bsp=a001008dcf80
  [a00100187470] get_new_slab+0x50/0x200
 sp=a001008e3e00 bsp=a001008dcf48
  [a00100187900] __slab_alloc+0x2e0/0x4e0
 sp=a001008e3e00 bsp=a001008dcf00
  [a00100187c80] kmem_cache_alloc_node+0x180/0x200
 sp=a001008e3e10 bsp=a001008dcec0
  [a001001945a0] mem_cgroup_create+0x160/0x400
 sp=a001008e3e10 bsp=a001008dce78
  [a001000f0940] cgroup_init_subsys+0xa0/0x400
 sp=a001008e3e20 bsp=a001008dce28
  [a001008521f0] cgroup_init+0x90/0x160
 sp=a001008e3e20 bsp=a001008dce00
  [a00100831960] start_kernel+0x700/0x820
 sp=a001008e3e20 bsp=a001008dcd80
 
Maybe zonelists of NODE_DATA() is not initialized. you are right.
I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug case 
later.)

Thank you for test!

Regards,
-Kame



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 10:37:02 +0900
KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote:

 Maybe zonelists of NODE_DATA() is not initialized. you are right.
 I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug 
 case later.)
 
 Thank you for test!
 
Could you try this ? 

Thanks,
-Kame
==

Don't call kmalloc() against possible but offline node.

Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED]

 mm/memcontrol.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

Index: test-2.6.24-rc3-mm1/mm/memcontrol.c
===
--- test-2.6.24-rc3-mm1.orig/mm/memcontrol.c
+++ test-2.6.24-rc3-mm1/mm/memcontrol.c
@@ -1117,8 +1117,14 @@ static int alloc_mem_cgroup_per_zone_inf
struct mem_cgroup_per_node *pn;
struct mem_cgroup_per_zone *mz;
int zone;
-
-   pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
+   /*
+* This routine is called against possible nodes.
+* But it's BUG to call kmalloc() against offline node.
+*/
+   if (node_state(N_ONLINE, node))
+   pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node);
+   else
+   pn = kmalloc(sizeof(*pn), GFP_KERNEL);
if (!pn)
return 1;
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 11:24:06 +0900
KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote:

 On Thu, 29 Nov 2007 10:37:02 +0900
 KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote:
 
  Maybe zonelists of NODE_DATA() is not initialized. you are right.
  I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug 
  case later.)
  
  Thank you for test!
  
 Could you try this ? 
 
Sorry..this can be a workaround but I noticed I miss something..

ok, just use N_HIGH_MEMORY here and add comment for hotplugging support is not 
yet.

Christoph-san, Lee-san, could you confirm following ?

- when SLAB is used, kmalloc_node() against offline node will success.
- when SLUB is used, kmalloc_node() against offline node will panic.

Then, the caller should take care that node is online before kmalloc().

Regards,
-Kame 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread YAMAMOTO Takashi
 @@ -651,10 +758,11 @@
   /* Avoid race with charge */
   atomic_set(pc-ref_cnt, 0);
   if (clear_page_cgroup(page, pc) == pc) {
 + int active;
   css_put(mem-css);
 + active = pc-flags  PAGE_CGROUP_FLAG_ACTIVE;
   res_counter_uncharge(mem-res, PAGE_SIZE);
 - list_del_init(pc-lru);
 - mem_cgroup_charge_statistics(mem, pc-flags, false);
 + __mem_cgroup_remove_list(pc);
   kfree(pc);
   } else  /* being uncharged ? ...do relax */
   break;

'active' seems unused.

YAMAMOTO Takashi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 12:19:37 +0900 (JST)
[EMAIL PROTECTED] (YAMAMOTO Takashi) wrote:

  @@ -651,10 +758,11 @@
  /* Avoid race with charge */
  atomic_set(pc-ref_cnt, 0);
  if (clear_page_cgroup(page, pc) == pc) {
  +   int active;
  css_put(mem-css);
  +   active = pc-flags  PAGE_CGROUP_FLAG_ACTIVE;
  res_counter_uncharge(mem-res, PAGE_SIZE);
  -   list_del_init(pc-lru);
  -   mem_cgroup_charge_statistics(mem, pc-flags, false);
  +   __mem_cgroup_remove_list(pc);
  kfree(pc);
  } else  /* being uncharged ? ...do relax */
  break;
 
 'active' seems unused.
 
ok, I will post clean-up against -mm2.

Thanks,
-Kame

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread Christoph Lameter
On Thu, 29 Nov 2007, KAMEZAWA Hiroyuki wrote:

 ok, just use N_HIGH_MEMORY here and add comment for hotplugging support is 
 not yet.
 
 Christoph-san, Lee-san, could you confirm following ?
 
 - when SLAB is used, kmalloc_node() against offline node will success.
 - when SLUB is used, kmalloc_node() against offline node will panic.
 
 Then, the caller should take care that node is online before kmalloc().

H... An offline node implies that the per node structure does not 
exist. SLAB should fail too. If there is something wrong with the allocs 
then its likely a difference in the way hotplug was put into SLAB and 
SLUB.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread YAMAMOTO Takashi
 +static inline struct mem_cgroup_per_zone *
 +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
 +{
 + if (!mem-info.nodeinfo[nid])

can this be true?

YAMAMOTO Takashi

 + return NULL;
 + return mem-info.nodeinfo[nid]-zoneinfo[zid];
 +}
 +
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-28 Thread KAMEZAWA Hiroyuki
On Thu, 29 Nov 2007 12:33:28 +0900 (JST)
[EMAIL PROTECTED] (YAMAMOTO Takashi) wrote:

  +static inline struct mem_cgroup_per_zone *
  +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
  +{
  +   if (!mem-info.nodeinfo[nid])
 
 can this be true?
 
 YAMAMOTO Takashi

When I set early_init=1, I added that check.
BUG_ON() is better ?

Thanks,
-Kame

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-26 Thread KAMEZAWA Hiroyuki
Counting active/inactive per-zone in memory controller.

This patch adds per-zone status in memory cgroup.
These values are often read (as per-zone value) by page reclaiming.

In current design, per-zone stat is just a unsigned long value and 
not an atomic value because they are modified only under lru_lock.
(So, atomic_ops is not necessary.)

This patch adds ACTIVE and INACTIVE per-zone status values.

For handling per-zone status, this patch adds
  struct mem_cgroup_per_zone {
...
  }
and some helper functions. This will be useful to add per-zone objects
in mem_cgroup.

This patch turns memory controller's early_init to be 0 for calling 
kmalloc() in initialization.

Changelog V2 -> V3
  - fixed comments.

Changelog V1 -> V2
  - added mem_cgroup_per_zone struct.
  This will help following patches to implement per-zone objects and
  pack them into a struct.
  - added __mem_cgroup_add_list() and __mem_cgroup_remove_list()
  - fixed page migration handling.
  - renamed zstat to info (per-zone-info)
This will be place for per-zone information(lru, lock, ..)
  - use page_cgroup_nid()/zid() funcs.

Acked-by: Balbir Singh <[EMAIL PROTECTED]>
Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]>


 mm/memcontrol.c |  164 +---
 1 file changed, 157 insertions(+), 7 deletions(-)

Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c
===
--- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c   2007-11-26 16:39:00.0 
+0900
+++ linux-2.6.24-rc3-mm1/mm/memcontrol.c2007-11-26 16:39:02.0 
+0900
@@ -78,6 +78,31 @@
 }
 
 /*
+ * per-zone information in memory controller.
+ */
+
+enum mem_cgroup_zstat_index {
+   MEM_CGROUP_ZSTAT_ACTIVE,
+   MEM_CGROUP_ZSTAT_INACTIVE,
+
+   NR_MEM_CGROUP_ZSTAT,
+};
+
+struct mem_cgroup_per_zone {
+   unsigned long count[NR_MEM_CGROUP_ZSTAT];
+};
+/* Macro for accessing counter */
+#define MEM_CGROUP_ZSTAT(mz, idx)  ((mz)->count[(idx)])
+
+struct mem_cgroup_per_node {
+   struct mem_cgroup_per_zone zoneinfo[MAX_NR_ZONES];
+};
+
+struct mem_cgroup_lru_info {
+   struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES];
+};
+
+/*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
  * statistics based on the statistics developed by Rik Van Riel for clock-pro,
@@ -101,6 +126,7 @@
 */
struct list_head active_list;
struct list_head inactive_list;
+   struct mem_cgroup_lru_info info;
/*
 * spin_lock to protect the per cgroup LRU
 */
@@ -158,6 +184,7 @@
MEM_CGROUP_CHARGE_TYPE_MAPPED,
 };
 
+
 /*
  * Always modified under lru lock. Then, not necessary to preempt_disable()
  */
@@ -173,7 +200,39 @@
MEM_CGROUP_STAT_CACHE, val);
else
__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_RSS, val);
+}
 
+static inline struct mem_cgroup_per_zone *
+mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
+{
+   if (!mem->info.nodeinfo[nid])
+   return NULL;
+   return >info.nodeinfo[nid]->zoneinfo[zid];
+}
+
+static inline struct mem_cgroup_per_zone *
+page_cgroup_zoneinfo(struct page_cgroup *pc)
+{
+   struct mem_cgroup *mem = pc->mem_cgroup;
+   int nid = page_cgroup_nid(pc);
+   int zid = page_cgroup_zid(pc);
+
+   return mem_cgroup_zoneinfo(mem, nid, zid);
+}
+
+static unsigned long mem_cgroup_get_all_zonestat(struct mem_cgroup *mem,
+   enum mem_cgroup_zstat_index idx)
+{
+   int nid, zid;
+   struct mem_cgroup_per_zone *mz;
+   u64 total = 0;
+
+   for_each_online_node(nid)
+   for (zid = 0; zid < MAX_NR_ZONES; zid++) {
+   mz = mem_cgroup_zoneinfo(mem, nid, zid);
+   total += MEM_CGROUP_ZSTAT(mz, idx);
+   }
+   return total;
 }
 
 static struct mem_cgroup init_mem_cgroup;
@@ -286,12 +345,51 @@
return ret;
 }
 
+static void __mem_cgroup_remove_list(struct page_cgroup *pc)
+{
+   int from = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
+   struct mem_cgroup_per_zone *mz = page_cgroup_zoneinfo(pc);
+
+   if (from)
+   MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_ACTIVE) -= 1;
+   else
+   MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) -= 1;
+
+   mem_cgroup_charge_statistics(pc->mem_cgroup, pc->flags, false);
+   list_del_init(>lru);
+}
+
+static void __mem_cgroup_add_list(struct page_cgroup *pc)
+{
+   int to = pc->flags & PAGE_CGROUP_FLAG_ACTIVE;
+   struct mem_cgroup_per_zone *mz = page_cgroup_zoneinfo(pc);
+
+   if (!to) {
+   MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) += 1;
+   list_add(>lru, >mem_cgroup->inactive_list);
+   } else {
+   MEM_CGROUP_ZSTAT(mz, 

[PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter

2007-11-26 Thread KAMEZAWA Hiroyuki
Counting active/inactive per-zone in memory controller.

This patch adds per-zone status in memory cgroup.
These values are often read (as per-zone value) by page reclaiming.

In current design, per-zone stat is just a unsigned long value and 
not an atomic value because they are modified only under lru_lock.
(So, atomic_ops is not necessary.)

This patch adds ACTIVE and INACTIVE per-zone status values.

For handling per-zone status, this patch adds
  struct mem_cgroup_per_zone {
...
  }
and some helper functions. This will be useful to add per-zone objects
in mem_cgroup.

This patch turns memory controller's early_init to be 0 for calling 
kmalloc() in initialization.

Changelog V2 - V3
  - fixed comments.

Changelog V1 - V2
  - added mem_cgroup_per_zone struct.
  This will help following patches to implement per-zone objects and
  pack them into a struct.
  - added __mem_cgroup_add_list() and __mem_cgroup_remove_list()
  - fixed page migration handling.
  - renamed zstat to info (per-zone-info)
This will be place for per-zone information(lru, lock, ..)
  - use page_cgroup_nid()/zid() funcs.

Acked-by: Balbir Singh [EMAIL PROTECTED]
Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED]


 mm/memcontrol.c |  164 +---
 1 file changed, 157 insertions(+), 7 deletions(-)

Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c
===
--- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c   2007-11-26 16:39:00.0 
+0900
+++ linux-2.6.24-rc3-mm1/mm/memcontrol.c2007-11-26 16:39:02.0 
+0900
@@ -78,6 +78,31 @@
 }
 
 /*
+ * per-zone information in memory controller.
+ */
+
+enum mem_cgroup_zstat_index {
+   MEM_CGROUP_ZSTAT_ACTIVE,
+   MEM_CGROUP_ZSTAT_INACTIVE,
+
+   NR_MEM_CGROUP_ZSTAT,
+};
+
+struct mem_cgroup_per_zone {
+   unsigned long count[NR_MEM_CGROUP_ZSTAT];
+};
+/* Macro for accessing counter */
+#define MEM_CGROUP_ZSTAT(mz, idx)  ((mz)-count[(idx)])
+
+struct mem_cgroup_per_node {
+   struct mem_cgroup_per_zone zoneinfo[MAX_NR_ZONES];
+};
+
+struct mem_cgroup_lru_info {
+   struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES];
+};
+
+/*
  * The memory controller data structure. The memory controller controls both
  * page cache and RSS per cgroup. We would eventually like to provide
  * statistics based on the statistics developed by Rik Van Riel for clock-pro,
@@ -101,6 +126,7 @@
 */
struct list_head active_list;
struct list_head inactive_list;
+   struct mem_cgroup_lru_info info;
/*
 * spin_lock to protect the per cgroup LRU
 */
@@ -158,6 +184,7 @@
MEM_CGROUP_CHARGE_TYPE_MAPPED,
 };
 
+
 /*
  * Always modified under lru lock. Then, not necessary to preempt_disable()
  */
@@ -173,7 +200,39 @@
MEM_CGROUP_STAT_CACHE, val);
else
__mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_RSS, val);
+}
 
+static inline struct mem_cgroup_per_zone *
+mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid)
+{
+   if (!mem-info.nodeinfo[nid])
+   return NULL;
+   return mem-info.nodeinfo[nid]-zoneinfo[zid];
+}
+
+static inline struct mem_cgroup_per_zone *
+page_cgroup_zoneinfo(struct page_cgroup *pc)
+{
+   struct mem_cgroup *mem = pc-mem_cgroup;
+   int nid = page_cgroup_nid(pc);
+   int zid = page_cgroup_zid(pc);
+
+   return mem_cgroup_zoneinfo(mem, nid, zid);
+}
+
+static unsigned long mem_cgroup_get_all_zonestat(struct mem_cgroup *mem,
+   enum mem_cgroup_zstat_index idx)
+{
+   int nid, zid;
+   struct mem_cgroup_per_zone *mz;
+   u64 total = 0;
+
+   for_each_online_node(nid)
+   for (zid = 0; zid  MAX_NR_ZONES; zid++) {
+   mz = mem_cgroup_zoneinfo(mem, nid, zid);
+   total += MEM_CGROUP_ZSTAT(mz, idx);
+   }
+   return total;
 }
 
 static struct mem_cgroup init_mem_cgroup;
@@ -286,12 +345,51 @@
return ret;
 }
 
+static void __mem_cgroup_remove_list(struct page_cgroup *pc)
+{
+   int from = pc-flags  PAGE_CGROUP_FLAG_ACTIVE;
+   struct mem_cgroup_per_zone *mz = page_cgroup_zoneinfo(pc);
+
+   if (from)
+   MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_ACTIVE) -= 1;
+   else
+   MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) -= 1;
+
+   mem_cgroup_charge_statistics(pc-mem_cgroup, pc-flags, false);
+   list_del_init(pc-lru);
+}
+
+static void __mem_cgroup_add_list(struct page_cgroup *pc)
+{
+   int to = pc-flags  PAGE_CGROUP_FLAG_ACTIVE;
+   struct mem_cgroup_per_zone *mz = page_cgroup_zoneinfo(pc);
+
+   if (!to) {
+   MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) += 1;
+   list_add(pc-lru, pc-mem_cgroup-inactive_list);
+   } else {
+   MEM_CGROUP_ZSTAT(mz,