Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
On Thu, 29 Nov 2007 12:33:28 +0900 (JST) [EMAIL PROTECTED] (YAMAMOTO Takashi) wrote: > > +static inline struct mem_cgroup_per_zone * > > +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid) > > +{ > > + if (!mem->info.nodeinfo[nid]) > > can this be true? > > YAMAMOTO Takashi When I set early_init=1, I added that check. BUG_ON() is better ? Thanks, -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
> +static inline struct mem_cgroup_per_zone * > +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid) > +{ > + if (!mem->info.nodeinfo[nid]) can this be true? YAMAMOTO Takashi > + return NULL; > + return >info.nodeinfo[nid]->zoneinfo[zid]; > +} > + - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
On Thu, 29 Nov 2007, KAMEZAWA Hiroyuki wrote: > ok, just use N_HIGH_MEMORY here and add comment for hotplugging support is > not yet. > > Christoph-san, Lee-san, could you confirm following ? > > - when SLAB is used, kmalloc_node() against offline node will success. > - when SLUB is used, kmalloc_node() against offline node will panic. > > Then, the caller should take care that node is online before kmalloc(). H... An offline node implies that the per node structure does not exist. SLAB should fail too. If there is something wrong with the allocs then its likely a difference in the way hotplug was put into SLAB and SLUB. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
On Thu, 29 Nov 2007 12:19:37 +0900 (JST) [EMAIL PROTECTED] (YAMAMOTO Takashi) wrote: > > @@ -651,10 +758,11 @@ > > /* Avoid race with charge */ > > atomic_set(>ref_cnt, 0); > > if (clear_page_cgroup(page, pc) == pc) { > > + int active; > > css_put(>css); > > + active = pc->flags & PAGE_CGROUP_FLAG_ACTIVE; > > res_counter_uncharge(>res, PAGE_SIZE); > > - list_del_init(>lru); > > - mem_cgroup_charge_statistics(mem, pc->flags, false); > > + __mem_cgroup_remove_list(pc); > > kfree(pc); > > } else /* being uncharged ? ...do relax */ > > break; > > 'active' seems unused. > ok, I will post clean-up against -mm2. Thanks, -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
> @@ -651,10 +758,11 @@ > /* Avoid race with charge */ > atomic_set(>ref_cnt, 0); > if (clear_page_cgroup(page, pc) == pc) { > + int active; > css_put(>css); > + active = pc->flags & PAGE_CGROUP_FLAG_ACTIVE; > res_counter_uncharge(>res, PAGE_SIZE); > - list_del_init(>lru); > - mem_cgroup_charge_statistics(mem, pc->flags, false); > + __mem_cgroup_remove_list(pc); > kfree(pc); > } else /* being uncharged ? ...do relax */ > break; 'active' seems unused. YAMAMOTO Takashi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
On Thu, 29 Nov 2007 11:24:06 +0900 KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote: > On Thu, 29 Nov 2007 10:37:02 +0900 > KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote: > > > Maybe zonelists of NODE_DATA() is not initialized. you are right. > > I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug > > case later.) > > > > Thank you for test! > > > Could you try this ? > Sorry..this can be a workaround but I noticed I miss something.. ok, just use N_HIGH_MEMORY here and add comment for hotplugging support is not yet. Christoph-san, Lee-san, could you confirm following ? - when SLAB is used, kmalloc_node() against offline node will success. - when SLUB is used, kmalloc_node() against offline node will panic. Then, the caller should take care that node is online before kmalloc(). Regards, -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
On Thu, 29 Nov 2007 10:37:02 +0900 KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> wrote: > Maybe zonelists of NODE_DATA() is not initialized. you are right. > I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug > case later.) > > Thank you for test! > Could you try this ? Thanks, -Kame == Don't call kmalloc() against possible but offline node. Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> mm/memcontrol.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) Index: test-2.6.24-rc3-mm1/mm/memcontrol.c === --- test-2.6.24-rc3-mm1.orig/mm/memcontrol.c +++ test-2.6.24-rc3-mm1/mm/memcontrol.c @@ -1117,8 +1117,14 @@ static int alloc_mem_cgroup_per_zone_inf struct mem_cgroup_per_node *pn; struct mem_cgroup_per_zone *mz; int zone; - - pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node); + /* +* This routine is called against possible nodes. +* But it's BUG to call kmalloc() against offline node. +*/ + if (node_state(N_ONLINE, node)) + pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node); + else + pn = kmalloc(sizeof(*pn), GFP_KERNEL); if (!pn) return 1; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
On Wed, 28 Nov 2007 16:19:59 -0500 Lee Schermerhorn <[EMAIL PROTECTED]> wrote: > As soon as this loop hits the first non-existent node on my platform, I > get a NULL pointer deref down in __alloc_pages. Stack trace below. > > Perhaps N_POSSIBLE should be N_HIGH_MEMORY? That would require handling > of memory/node hotplug for each memory control group, right? But, I'm > going to try N_HIGH_MEMORY as a work around. > Hmm, ok. (>_< > Call Trace: > [] show_stack+0x80/0xa0 > sp=a001008e39c0 bsp=a001008dd1b0 > [] show_regs+0x870/0x8a0 > sp=a001008e3b90 bsp=a001008dd158 > [] die+0x190/0x300 > sp=a001008e3b90 bsp=a001008dd110 > [] ia64_do_page_fault+0x8e0/0xa20 > sp=a001008e3b90 bsp=a001008dd0b8 > [] ia64_leave_kernel+0x0/0x270 > sp=a001008e3c20 bsp=a001008dd0b8 > [] __alloc_pages+0x30/0x6e0 > sp=a001008e3df0 bsp=a001008dcfe0 > [] new_slab+0x610/0x6c0 > sp=a001008e3e00 bsp=a001008dcf80 > [] get_new_slab+0x50/0x200 > sp=a001008e3e00 bsp=a001008dcf48 > [] __slab_alloc+0x2e0/0x4e0 > sp=a001008e3e00 bsp=a001008dcf00 > [] kmem_cache_alloc_node+0x180/0x200 > sp=a001008e3e10 bsp=a001008dcec0 > [] mem_cgroup_create+0x160/0x400 > sp=a001008e3e10 bsp=a001008dce78 > [] cgroup_init_subsys+0xa0/0x400 > sp=a001008e3e20 bsp=a001008dce28 > [] cgroup_init+0x90/0x160 > sp=a001008e3e20 bsp=a001008dce00 > [] start_kernel+0x700/0x820 > sp=a001008e3e20 bsp=a001008dcd80 > Maybe zonelists of NODE_DATA() is not initialized. you are right. I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug case later.) Thank you for test! Regards, -Kame - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
Just a "heads up": This patch is the apparent cause of a boot time panic--null pointer deref--on my numa platform. See below. On Tue, 2007-11-27 at 12:00 +0900, KAMEZAWA Hiroyuki wrote: > Counting active/inactive per-zone in memory controller. > > This patch adds per-zone status in memory cgroup. > These values are often read (as per-zone value) by page reclaiming. > > In current design, per-zone stat is just a unsigned long value and > not an atomic value because they are modified only under lru_lock. > (So, atomic_ops is not necessary.) > > This patch adds ACTIVE and INACTIVE per-zone status values. > > For handling per-zone status, this patch adds > struct mem_cgroup_per_zone { > ... > } > and some helper functions. This will be useful to add per-zone objects > in mem_cgroup. > > This patch turns memory controller's early_init to be 0 for calling > kmalloc() in initialization. > > Changelog V2 -> V3 > - fixed comments. > > Changelog V1 -> V2 > - added mem_cgroup_per_zone struct. > This will help following patches to implement per-zone objects and > pack them into a struct. > - added __mem_cgroup_add_list() and __mem_cgroup_remove_list() > - fixed page migration handling. > - renamed zstat to info (per-zone-info) > This will be place for per-zone information(lru, lock, ..) > - use page_cgroup_nid()/zid() funcs. > > Acked-by: Balbir Singh <[EMAIL PROTECTED]> > Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> > > > mm/memcontrol.c | 164 > +--- > 1 file changed, 157 insertions(+), 7 deletions(-) > > Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c > === > --- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c 2007-11-26 16:39:00.0 > +0900 > +++ linux-2.6.24-rc3-mm1/mm/memcontrol.c 2007-11-26 16:39:02.0 > +0900 > @@ -78,6 +78,31 @@ > > +static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node) > +{ > + struct mem_cgroup_per_node *pn; > + > + pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node); > + if (!pn) > + return 1; > + mem->info.nodeinfo[node] = pn; > + memset(pn, 0, sizeof(*pn)); > + return 0; > +} > + > static struct mem_cgroup init_mem_cgroup; > > static struct cgroup_subsys_state * > mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) > { > struct mem_cgroup *mem; > + int node; > > if (unlikely((cont->parent) == NULL)) { > mem = _mem_cgroup; > @@ -907,7 +1039,19 @@ > INIT_LIST_HEAD(>inactive_list); > spin_lock_init(>lru_lock); > mem->control_type = MEM_CGROUP_TYPE_ALL; > + memset(>info, 0, sizeof(mem->info)); > + > + for_each_node_state(node, N_POSSIBLE) > + if (alloc_mem_cgroup_per_zone_info(mem, node)) > + goto free_out; > + As soon as this loop hits the first non-existent node on my platform, I get a NULL pointer deref down in __alloc_pages. Stack trace below. Perhaps N_POSSIBLE should be N_HIGH_MEMORY? That would require handling of memory/node hotplug for each memory control group, right? But, I'm going to try N_HIGH_MEMORY as a work around. Lee > return >css; > +free_out: > + for_each_node_state(node, N_POSSIBLE) > + kfree(mem->info.nodeinfo[node]); > + if (cont->parent != NULL) > + kfree(mem); > + return NULL; > } > > static void mem_cgroup_pre_destroy(struct cgroup_subsys *ss, > @@ -920,6 +1064,12 @@ > static void mem_cgroup_destroy(struct cgroup_subsys *ss, > struct cgroup *cont) > { > + int node; > + struct mem_cgroup *mem = mem_cgroup_from_cont(cont); > + > + for_each_node_state(node, N_POSSIBLE) > + kfree(mem->info.nodeinfo[node]); > + > kfree(mem_cgroup_from_cont(cont)); > } > > @@ -972,5 +1122,5 @@ > .destroy = mem_cgroup_destroy, > .populate = mem_cgroup_populate, > .attach = mem_cgroup_move_task, > - .early_init = 1, > + .early_init = 0, > }; Initializing cgroup subsys memory Unable to handle kernel NULL pointer dereference (address 3c80) swapper[0]: Oops 11012296146944 [1] Modules linked in: Pid: 0, CPU 0, comm: swapper psr : 1210084a6010 ifs : 8b1a ip : []Not tainted ip is at __alloc_pages+0x31/0x6e0 unat: pfs : 060f rsc : 0003 rnat: a001009db3b8 bsps: a001009e0490 pr : 656960155aa65659 ldrs: ccv : fpsr: 0009804c8a70433f csd : ssd : b0 : a00100187370 b6 : a00100194440 b7 : a0010086d560 f6 : 1003e f7 : 1003e0055 f8 : 1003e00c0 f9 : 1003e3fc0 f10 : 1003e00c0 f11 : 1003e0055 r1 : a00100bc0f10 r2 : ffe6 r3 :
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
Just a heads up: This patch is the apparent cause of a boot time panic--null pointer deref--on my numa platform. See below. On Tue, 2007-11-27 at 12:00 +0900, KAMEZAWA Hiroyuki wrote: Counting active/inactive per-zone in memory controller. This patch adds per-zone status in memory cgroup. These values are often read (as per-zone value) by page reclaiming. In current design, per-zone stat is just a unsigned long value and not an atomic value because they are modified only under lru_lock. (So, atomic_ops is not necessary.) This patch adds ACTIVE and INACTIVE per-zone status values. For handling per-zone status, this patch adds struct mem_cgroup_per_zone { ... } and some helper functions. This will be useful to add per-zone objects in mem_cgroup. This patch turns memory controller's early_init to be 0 for calling kmalloc() in initialization. Changelog V2 - V3 - fixed comments. Changelog V1 - V2 - added mem_cgroup_per_zone struct. This will help following patches to implement per-zone objects and pack them into a struct. - added __mem_cgroup_add_list() and __mem_cgroup_remove_list() - fixed page migration handling. - renamed zstat to info (per-zone-info) This will be place for per-zone information(lru, lock, ..) - use page_cgroup_nid()/zid() funcs. Acked-by: Balbir Singh [EMAIL PROTECTED] Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED] mm/memcontrol.c | 164 +--- 1 file changed, 157 insertions(+), 7 deletions(-) Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c === --- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c 2007-11-26 16:39:00.0 +0900 +++ linux-2.6.24-rc3-mm1/mm/memcontrol.c 2007-11-26 16:39:02.0 +0900 @@ -78,6 +78,31 @@ snip +static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node) +{ + struct mem_cgroup_per_node *pn; + + pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node); + if (!pn) + return 1; + mem-info.nodeinfo[node] = pn; + memset(pn, 0, sizeof(*pn)); + return 0; +} + static struct mem_cgroup init_mem_cgroup; static struct cgroup_subsys_state * mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont) { struct mem_cgroup *mem; + int node; if (unlikely((cont-parent) == NULL)) { mem = init_mem_cgroup; @@ -907,7 +1039,19 @@ INIT_LIST_HEAD(mem-inactive_list); spin_lock_init(mem-lru_lock); mem-control_type = MEM_CGROUP_TYPE_ALL; + memset(mem-info, 0, sizeof(mem-info)); + + for_each_node_state(node, N_POSSIBLE) + if (alloc_mem_cgroup_per_zone_info(mem, node)) + goto free_out; + As soon as this loop hits the first non-existent node on my platform, I get a NULL pointer deref down in __alloc_pages. Stack trace below. Perhaps N_POSSIBLE should be N_HIGH_MEMORY? That would require handling of memory/node hotplug for each memory control group, right? But, I'm going to try N_HIGH_MEMORY as a work around. Lee return mem-css; +free_out: + for_each_node_state(node, N_POSSIBLE) + kfree(mem-info.nodeinfo[node]); + if (cont-parent != NULL) + kfree(mem); + return NULL; } static void mem_cgroup_pre_destroy(struct cgroup_subsys *ss, @@ -920,6 +1064,12 @@ static void mem_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cont) { + int node; + struct mem_cgroup *mem = mem_cgroup_from_cont(cont); + + for_each_node_state(node, N_POSSIBLE) + kfree(mem-info.nodeinfo[node]); + kfree(mem_cgroup_from_cont(cont)); } @@ -972,5 +1122,5 @@ .destroy = mem_cgroup_destroy, .populate = mem_cgroup_populate, .attach = mem_cgroup_move_task, - .early_init = 1, + .early_init = 0, }; Initializing cgroup subsys memory Unable to handle kernel NULL pointer dereference (address 3c80) swapper[0]: Oops 11012296146944 [1] Modules linked in: Pid: 0, CPU 0, comm: swapper psr : 1210084a6010 ifs : 8b1a ip : [a00100132e11]Not tainted ip is at __alloc_pages+0x31/0x6e0 unat: pfs : 060f rsc : 0003 rnat: a001009db3b8 bsps: a001009e0490 pr : 656960155aa65659 ldrs: ccv : fpsr: 0009804c8a70433f csd : ssd : b0 : a00100187370 b6 : a00100194440 b7 : a0010086d560 f6 : 1003e f7 : 1003e0055 f8 : 1003e00c0 f9 : 1003e3fc0 f10 : 1003e00c0 f11 : 1003e0055 r1 : a00100bc0f10 r2 : ffe6 r3 : 0002 r8 : 00071ef0 r9 : 0005 r10 : e7002034d588 r11 :
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
On Wed, 28 Nov 2007 16:19:59 -0500 Lee Schermerhorn [EMAIL PROTECTED] wrote: As soon as this loop hits the first non-existent node on my platform, I get a NULL pointer deref down in __alloc_pages. Stack trace below. Perhaps N_POSSIBLE should be N_HIGH_MEMORY? That would require handling of memory/node hotplug for each memory control group, right? But, I'm going to try N_HIGH_MEMORY as a work around. Hmm, ok. (_ Call Trace: [a00100014de0] show_stack+0x80/0xa0 sp=a001008e39c0 bsp=a001008dd1b0 [a00100015a70] show_regs+0x870/0x8a0 sp=a001008e3b90 bsp=a001008dd158 [a0010003d130] die+0x190/0x300 sp=a001008e3b90 bsp=a001008dd110 [a00100071b80] ia64_do_page_fault+0x8e0/0xa20 sp=a001008e3b90 bsp=a001008dd0b8 [a001b5c0] ia64_leave_kernel+0x0/0x270 sp=a001008e3c20 bsp=a001008dd0b8 [a00100132e10] __alloc_pages+0x30/0x6e0 sp=a001008e3df0 bsp=a001008dcfe0 [a00100187370] new_slab+0x610/0x6c0 sp=a001008e3e00 bsp=a001008dcf80 [a00100187470] get_new_slab+0x50/0x200 sp=a001008e3e00 bsp=a001008dcf48 [a00100187900] __slab_alloc+0x2e0/0x4e0 sp=a001008e3e00 bsp=a001008dcf00 [a00100187c80] kmem_cache_alloc_node+0x180/0x200 sp=a001008e3e10 bsp=a001008dcec0 [a001001945a0] mem_cgroup_create+0x160/0x400 sp=a001008e3e10 bsp=a001008dce78 [a001000f0940] cgroup_init_subsys+0xa0/0x400 sp=a001008e3e20 bsp=a001008dce28 [a001008521f0] cgroup_init+0x90/0x160 sp=a001008e3e20 bsp=a001008dce00 [a00100831960] start_kernel+0x700/0x820 sp=a001008e3e20 bsp=a001008dcd80 Maybe zonelists of NODE_DATA() is not initialized. you are right. I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug case later.) Thank you for test! Regards, -Kame - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
On Thu, 29 Nov 2007 10:37:02 +0900 KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote: Maybe zonelists of NODE_DATA() is not initialized. you are right. I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug case later.) Thank you for test! Could you try this ? Thanks, -Kame == Don't call kmalloc() against possible but offline node. Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED] mm/memcontrol.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) Index: test-2.6.24-rc3-mm1/mm/memcontrol.c === --- test-2.6.24-rc3-mm1.orig/mm/memcontrol.c +++ test-2.6.24-rc3-mm1/mm/memcontrol.c @@ -1117,8 +1117,14 @@ static int alloc_mem_cgroup_per_zone_inf struct mem_cgroup_per_node *pn; struct mem_cgroup_per_zone *mz; int zone; - - pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node); + /* +* This routine is called against possible nodes. +* But it's BUG to call kmalloc() against offline node. +*/ + if (node_state(N_ONLINE, node)) + pn = kmalloc_node(sizeof(*pn), GFP_KERNEL, node); + else + pn = kmalloc(sizeof(*pn), GFP_KERNEL); if (!pn) return 1; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
On Thu, 29 Nov 2007 11:24:06 +0900 KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote: On Thu, 29 Nov 2007 10:37:02 +0900 KAMEZAWA Hiroyuki [EMAIL PROTECTED] wrote: Maybe zonelists of NODE_DATA() is not initialized. you are right. I think N_HIGH_MEMORY will be suitable here...(I'll consider node-hotplug case later.) Thank you for test! Could you try this ? Sorry..this can be a workaround but I noticed I miss something.. ok, just use N_HIGH_MEMORY here and add comment for hotplugging support is not yet. Christoph-san, Lee-san, could you confirm following ? - when SLAB is used, kmalloc_node() against offline node will success. - when SLUB is used, kmalloc_node() against offline node will panic. Then, the caller should take care that node is online before kmalloc(). Regards, -Kame - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
@@ -651,10 +758,11 @@ /* Avoid race with charge */ atomic_set(pc-ref_cnt, 0); if (clear_page_cgroup(page, pc) == pc) { + int active; css_put(mem-css); + active = pc-flags PAGE_CGROUP_FLAG_ACTIVE; res_counter_uncharge(mem-res, PAGE_SIZE); - list_del_init(pc-lru); - mem_cgroup_charge_statistics(mem, pc-flags, false); + __mem_cgroup_remove_list(pc); kfree(pc); } else /* being uncharged ? ...do relax */ break; 'active' seems unused. YAMAMOTO Takashi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
On Thu, 29 Nov 2007 12:19:37 +0900 (JST) [EMAIL PROTECTED] (YAMAMOTO Takashi) wrote: @@ -651,10 +758,11 @@ /* Avoid race with charge */ atomic_set(pc-ref_cnt, 0); if (clear_page_cgroup(page, pc) == pc) { + int active; css_put(mem-css); + active = pc-flags PAGE_CGROUP_FLAG_ACTIVE; res_counter_uncharge(mem-res, PAGE_SIZE); - list_del_init(pc-lru); - mem_cgroup_charge_statistics(mem, pc-flags, false); + __mem_cgroup_remove_list(pc); kfree(pc); } else /* being uncharged ? ...do relax */ break; 'active' seems unused. ok, I will post clean-up against -mm2. Thanks, -Kame - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
On Thu, 29 Nov 2007, KAMEZAWA Hiroyuki wrote: ok, just use N_HIGH_MEMORY here and add comment for hotplugging support is not yet. Christoph-san, Lee-san, could you confirm following ? - when SLAB is used, kmalloc_node() against offline node will success. - when SLUB is used, kmalloc_node() against offline node will panic. Then, the caller should take care that node is online before kmalloc(). H... An offline node implies that the per node structure does not exist. SLAB should fail too. If there is something wrong with the allocs then its likely a difference in the way hotplug was put into SLAB and SLUB. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
+static inline struct mem_cgroup_per_zone * +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid) +{ + if (!mem-info.nodeinfo[nid]) can this be true? YAMAMOTO Takashi + return NULL; + return mem-info.nodeinfo[nid]-zoneinfo[zid]; +} + - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
On Thu, 29 Nov 2007 12:33:28 +0900 (JST) [EMAIL PROTECTED] (YAMAMOTO Takashi) wrote: +static inline struct mem_cgroup_per_zone * +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid) +{ + if (!mem-info.nodeinfo[nid]) can this be true? YAMAMOTO Takashi When I set early_init=1, I added that check. BUG_ON() is better ? Thanks, -Kame - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
Counting active/inactive per-zone in memory controller. This patch adds per-zone status in memory cgroup. These values are often read (as per-zone value) by page reclaiming. In current design, per-zone stat is just a unsigned long value and not an atomic value because they are modified only under lru_lock. (So, atomic_ops is not necessary.) This patch adds ACTIVE and INACTIVE per-zone status values. For handling per-zone status, this patch adds struct mem_cgroup_per_zone { ... } and some helper functions. This will be useful to add per-zone objects in mem_cgroup. This patch turns memory controller's early_init to be 0 for calling kmalloc() in initialization. Changelog V2 -> V3 - fixed comments. Changelog V1 -> V2 - added mem_cgroup_per_zone struct. This will help following patches to implement per-zone objects and pack them into a struct. - added __mem_cgroup_add_list() and __mem_cgroup_remove_list() - fixed page migration handling. - renamed zstat to info (per-zone-info) This will be place for per-zone information(lru, lock, ..) - use page_cgroup_nid()/zid() funcs. Acked-by: Balbir Singh <[EMAIL PROTECTED]> Signed-off-by: KAMEZAWA Hiroyuki <[EMAIL PROTECTED]> mm/memcontrol.c | 164 +--- 1 file changed, 157 insertions(+), 7 deletions(-) Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c === --- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c 2007-11-26 16:39:00.0 +0900 +++ linux-2.6.24-rc3-mm1/mm/memcontrol.c2007-11-26 16:39:02.0 +0900 @@ -78,6 +78,31 @@ } /* + * per-zone information in memory controller. + */ + +enum mem_cgroup_zstat_index { + MEM_CGROUP_ZSTAT_ACTIVE, + MEM_CGROUP_ZSTAT_INACTIVE, + + NR_MEM_CGROUP_ZSTAT, +}; + +struct mem_cgroup_per_zone { + unsigned long count[NR_MEM_CGROUP_ZSTAT]; +}; +/* Macro for accessing counter */ +#define MEM_CGROUP_ZSTAT(mz, idx) ((mz)->count[(idx)]) + +struct mem_cgroup_per_node { + struct mem_cgroup_per_zone zoneinfo[MAX_NR_ZONES]; +}; + +struct mem_cgroup_lru_info { + struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES]; +}; + +/* * The memory controller data structure. The memory controller controls both * page cache and RSS per cgroup. We would eventually like to provide * statistics based on the statistics developed by Rik Van Riel for clock-pro, @@ -101,6 +126,7 @@ */ struct list_head active_list; struct list_head inactive_list; + struct mem_cgroup_lru_info info; /* * spin_lock to protect the per cgroup LRU */ @@ -158,6 +184,7 @@ MEM_CGROUP_CHARGE_TYPE_MAPPED, }; + /* * Always modified under lru lock. Then, not necessary to preempt_disable() */ @@ -173,7 +200,39 @@ MEM_CGROUP_STAT_CACHE, val); else __mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_RSS, val); +} +static inline struct mem_cgroup_per_zone * +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid) +{ + if (!mem->info.nodeinfo[nid]) + return NULL; + return >info.nodeinfo[nid]->zoneinfo[zid]; +} + +static inline struct mem_cgroup_per_zone * +page_cgroup_zoneinfo(struct page_cgroup *pc) +{ + struct mem_cgroup *mem = pc->mem_cgroup; + int nid = page_cgroup_nid(pc); + int zid = page_cgroup_zid(pc); + + return mem_cgroup_zoneinfo(mem, nid, zid); +} + +static unsigned long mem_cgroup_get_all_zonestat(struct mem_cgroup *mem, + enum mem_cgroup_zstat_index idx) +{ + int nid, zid; + struct mem_cgroup_per_zone *mz; + u64 total = 0; + + for_each_online_node(nid) + for (zid = 0; zid < MAX_NR_ZONES; zid++) { + mz = mem_cgroup_zoneinfo(mem, nid, zid); + total += MEM_CGROUP_ZSTAT(mz, idx); + } + return total; } static struct mem_cgroup init_mem_cgroup; @@ -286,12 +345,51 @@ return ret; } +static void __mem_cgroup_remove_list(struct page_cgroup *pc) +{ + int from = pc->flags & PAGE_CGROUP_FLAG_ACTIVE; + struct mem_cgroup_per_zone *mz = page_cgroup_zoneinfo(pc); + + if (from) + MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_ACTIVE) -= 1; + else + MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) -= 1; + + mem_cgroup_charge_statistics(pc->mem_cgroup, pc->flags, false); + list_del_init(>lru); +} + +static void __mem_cgroup_add_list(struct page_cgroup *pc) +{ + int to = pc->flags & PAGE_CGROUP_FLAG_ACTIVE; + struct mem_cgroup_per_zone *mz = page_cgroup_zoneinfo(pc); + + if (!to) { + MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) += 1; + list_add(>lru, >mem_cgroup->inactive_list); + } else { + MEM_CGROUP_ZSTAT(mz,
[PATCH][for -mm] per-zone and reclaim enhancements for memory controller take 3 [3/10] per-zone active inactive counter
Counting active/inactive per-zone in memory controller. This patch adds per-zone status in memory cgroup. These values are often read (as per-zone value) by page reclaiming. In current design, per-zone stat is just a unsigned long value and not an atomic value because they are modified only under lru_lock. (So, atomic_ops is not necessary.) This patch adds ACTIVE and INACTIVE per-zone status values. For handling per-zone status, this patch adds struct mem_cgroup_per_zone { ... } and some helper functions. This will be useful to add per-zone objects in mem_cgroup. This patch turns memory controller's early_init to be 0 for calling kmalloc() in initialization. Changelog V2 - V3 - fixed comments. Changelog V1 - V2 - added mem_cgroup_per_zone struct. This will help following patches to implement per-zone objects and pack them into a struct. - added __mem_cgroup_add_list() and __mem_cgroup_remove_list() - fixed page migration handling. - renamed zstat to info (per-zone-info) This will be place for per-zone information(lru, lock, ..) - use page_cgroup_nid()/zid() funcs. Acked-by: Balbir Singh [EMAIL PROTECTED] Signed-off-by: KAMEZAWA Hiroyuki [EMAIL PROTECTED] mm/memcontrol.c | 164 +--- 1 file changed, 157 insertions(+), 7 deletions(-) Index: linux-2.6.24-rc3-mm1/mm/memcontrol.c === --- linux-2.6.24-rc3-mm1.orig/mm/memcontrol.c 2007-11-26 16:39:00.0 +0900 +++ linux-2.6.24-rc3-mm1/mm/memcontrol.c2007-11-26 16:39:02.0 +0900 @@ -78,6 +78,31 @@ } /* + * per-zone information in memory controller. + */ + +enum mem_cgroup_zstat_index { + MEM_CGROUP_ZSTAT_ACTIVE, + MEM_CGROUP_ZSTAT_INACTIVE, + + NR_MEM_CGROUP_ZSTAT, +}; + +struct mem_cgroup_per_zone { + unsigned long count[NR_MEM_CGROUP_ZSTAT]; +}; +/* Macro for accessing counter */ +#define MEM_CGROUP_ZSTAT(mz, idx) ((mz)-count[(idx)]) + +struct mem_cgroup_per_node { + struct mem_cgroup_per_zone zoneinfo[MAX_NR_ZONES]; +}; + +struct mem_cgroup_lru_info { + struct mem_cgroup_per_node *nodeinfo[MAX_NUMNODES]; +}; + +/* * The memory controller data structure. The memory controller controls both * page cache and RSS per cgroup. We would eventually like to provide * statistics based on the statistics developed by Rik Van Riel for clock-pro, @@ -101,6 +126,7 @@ */ struct list_head active_list; struct list_head inactive_list; + struct mem_cgroup_lru_info info; /* * spin_lock to protect the per cgroup LRU */ @@ -158,6 +184,7 @@ MEM_CGROUP_CHARGE_TYPE_MAPPED, }; + /* * Always modified under lru lock. Then, not necessary to preempt_disable() */ @@ -173,7 +200,39 @@ MEM_CGROUP_STAT_CACHE, val); else __mem_cgroup_stat_add_safe(stat, MEM_CGROUP_STAT_RSS, val); +} +static inline struct mem_cgroup_per_zone * +mem_cgroup_zoneinfo(struct mem_cgroup *mem, int nid, int zid) +{ + if (!mem-info.nodeinfo[nid]) + return NULL; + return mem-info.nodeinfo[nid]-zoneinfo[zid]; +} + +static inline struct mem_cgroup_per_zone * +page_cgroup_zoneinfo(struct page_cgroup *pc) +{ + struct mem_cgroup *mem = pc-mem_cgroup; + int nid = page_cgroup_nid(pc); + int zid = page_cgroup_zid(pc); + + return mem_cgroup_zoneinfo(mem, nid, zid); +} + +static unsigned long mem_cgroup_get_all_zonestat(struct mem_cgroup *mem, + enum mem_cgroup_zstat_index idx) +{ + int nid, zid; + struct mem_cgroup_per_zone *mz; + u64 total = 0; + + for_each_online_node(nid) + for (zid = 0; zid MAX_NR_ZONES; zid++) { + mz = mem_cgroup_zoneinfo(mem, nid, zid); + total += MEM_CGROUP_ZSTAT(mz, idx); + } + return total; } static struct mem_cgroup init_mem_cgroup; @@ -286,12 +345,51 @@ return ret; } +static void __mem_cgroup_remove_list(struct page_cgroup *pc) +{ + int from = pc-flags PAGE_CGROUP_FLAG_ACTIVE; + struct mem_cgroup_per_zone *mz = page_cgroup_zoneinfo(pc); + + if (from) + MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_ACTIVE) -= 1; + else + MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) -= 1; + + mem_cgroup_charge_statistics(pc-mem_cgroup, pc-flags, false); + list_del_init(pc-lru); +} + +static void __mem_cgroup_add_list(struct page_cgroup *pc) +{ + int to = pc-flags PAGE_CGROUP_FLAG_ACTIVE; + struct mem_cgroup_per_zone *mz = page_cgroup_zoneinfo(pc); + + if (!to) { + MEM_CGROUP_ZSTAT(mz, MEM_CGROUP_ZSTAT_INACTIVE) += 1; + list_add(pc-lru, pc-mem_cgroup-inactive_list); + } else { + MEM_CGROUP_ZSTAT(mz,