Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
Balbir Singh writes: Here's a better and more complete fix for the problem. Could you please see if it works for you? I tested it on a real NUMA box and it seemed to work fine there. There are a couple of other changes in behaviour that your patch introduces, and I'd like to understand them better before taking the patch. First, with your patch we don't set nodes online if they end up having no memory in them because of the memory limit, whereas previously we did. Secondly, in the case where we don't have NUMA information, we now set node 0 online after adding each LMB, whereas previously we only set it online once. If in fact these changes are benign, then your patch description should mention them and explain why they are benign. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
* Paul Mackerras [EMAIL PROTECTED] [2008-01-27 22:55:43]: Balbir Singh writes: Here's a better and more complete fix for the problem. Could you please see if it works for you? I tested it on a real NUMA box and it seemed to work fine there. There are a couple of other changes in behaviour that your patch introduces, and I'd like to understand them better before taking the patch. First, with your patch we don't set nodes online if they end up having no memory in them because of the memory limit, whereas previously we did. Secondly, in the case where we don't have NUMA information, we now set node 0 online after adding each LMB, whereas previously we only set it online once. If in fact these changes are benign, then your patch description should mention them and explain why they are benign. Yes, they are. I'll try and justify the changes with a good detailed changelog. If people prefer it, I can hide fake NUMA nodes under a config option, so that it does not come enabled by default. Thanks for keeping me honest. Paul. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
On 1/27/08, Balbir Singh [EMAIL PROTECTED] wrote: * Paul Mackerras [EMAIL PROTECTED] [2008-01-27 22:55:43]: Balbir Singh writes: Here's a better and more complete fix for the problem. Could you please see if it works for you? I tested it on a real NUMA box and it seemed to work fine there. There are a couple of other changes in behaviour that your patch introduces, and I'd like to understand them better before taking the patch. First, with your patch we don't set nodes online if they end up having no memory in them because of the memory limit, whereas previously we did. Secondly, in the case where we don't have NUMA information, we now set node 0 online after adding each LMB, whereas previously we only set it online once. If in fact these changes are benign, then your patch description should mention them and explain why they are benign. Yes, they are. I'll try and justify the changes with a good detailed changelog. If people prefer it, I can hide fake NUMA nodes under a config option, so that it does not come enabled by default. Sigh, there already *is* a fake NUMA config option: CONFIG_NUMA_EMU. CONFIG_NUMA_EMU: Enable NUMA emulation. A flat machine will be split into virtual nodes when booted with numa=fake=N, where N is the number of nodes. This is only useful for debugging. I have to assume your patch is implementing the same feature for powerpc (really just extending the x86_64 one), and thus should share the config option. Any chance you can just make some of that code common? Maybe as a follow-on patch. I expect that some of Mel's (added to Cc) work to allow NUMA to be set on x86 more easily will flow quite simply into adding fake NUMA support there as well. So moving the code to a common place (at least the parsing) makes sense. I also feel like you want to be able to online memoryless nodes -- that's where we've been hitting a number of bugs lately in the VM. I can't tell from Paul's comment if your patch prevents that from being faked or not. Thanks, Nish ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
* Michael Ellerman [EMAIL PROTECTED] [2008-01-18 16:44:58]: This fixes it, although I'm a little worried about some of the removals/movings of node_set_online() in the patch. diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 1666e7d..dcedc26 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -49,7 +49,6 @@ static int __cpuinit fake_numa_create_new_node(unsigned long end_pfn, static unsigned int fake_nid = 0; static unsigned long long curr_boundary = 0; - *nid = fake_nid; if (!p) return 0; @@ -60,6 +59,7 @@ static int __cpuinit fake_numa_create_new_node(unsigned long end_pfn, if (mem curr_boundary) return 0; + *nid = fake_nid; curr_boundary = mem; if ((end_pfn PAGE_SHIFT) mem) { Hi, Michael, Here's a better and more complete fix for the problem. Could you please see if it works for you? I tested it on a real NUMA box and it seemed to work fine there. Description --- This patch provides a fix for the problem found by Michael Ellerman [EMAIL PROTECTED] while using fake NUMA nodes on a cell box. The code modifies node id iff (as in if and only if) fake NUMA nodes are created. Signed-off-by: Balbir Singh [EMAIL PROTECTED] --- arch/powerpc/mm/numa.c |7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff -puN arch/powerpc/mm/numa.c~fix-fake-numa-nid-on-numa arch/powerpc/mm/numa.c --- linux-2.6.24-rc8/arch/powerpc/mm/numa.c~fix-fake-numa-nid-on-numa 2008-01-26 12:20:29.0 +0530 +++ linux-2.6.24-rc8-balbir/arch/powerpc/mm/numa.c 2008-01-26 12:27:53.0 +0530 @@ -49,7 +49,12 @@ static int __cpuinit fake_numa_create_ne static unsigned int fake_nid = 0; static unsigned long long curr_boundary = 0; - *nid = fake_nid; + /* +* If we did enable fake nodes and cross a node, +* remember the last node and start from there. +*/ + if (fake_nid) + *nid = fake_nid; if (!p) return 0; _ -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
* Michael Ellerman [EMAIL PROTECTED] [2008-01-18 16:44:58]: On Fri, 2008-01-18 at 16:34 +1100, Michael Ellerman wrote: On Sat, 2007-12-08 at 04:07 +0530, Balbir Singh wrote: Changelog 1. Get rid of the constant 5 (based on comments from [EMAIL PROTECTED]) 2. Implement suggestions from Olof Johannson 3. Check if cmdline is NULL in fake_numa_create_new_node() Tested with additional parameters from Olof numa=debug,fake= numa=foo,fake=bar I'm not sure why yet, but git bisect tells me it's this patch that's causing the for-2.6.25 tree to explode on boot on cell machines. This fixes it, although I'm a little worried about some of the removals/movings of node_set_online() in the patch. diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 1666e7d..dcedc26 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -49,7 +49,6 @@ static int __cpuinit fake_numa_create_new_node(unsigned long end_pfn, static unsigned int fake_nid = 0; static unsigned long long curr_boundary = 0; - *nid = fake_nid; if (!p) return 0; @@ -60,6 +59,7 @@ static int __cpuinit fake_numa_create_new_node(unsigned long end_pfn, if (mem curr_boundary) return 0; + *nid = fake_nid; curr_boundary = mem; if ((end_pfn PAGE_SHIFT) mem) { This patch makes sense, ideally fake_numa_create_new_node() should just be a no-op in the case of machines with real NUMA nodes. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
* Michael Ellerman [EMAIL PROTECTED] [2008-01-18 16:55:03]: On Sat, 2007-12-08 at 04:07 +0530, Balbir Singh wrote: Here's a dumb simple implementation of fake NUMA nodes for PowerPC. Fake NUMA nodes can be specified using the following command line option Comments are as always welcome! Here's some :) Thanks! diff -puN arch/powerpc/mm/numa.c~ppc-fake-numa-easy arch/powerpc/mm/numa.c --- linux-2.6.24-rc4-mm1/arch/powerpc/mm/numa.c~ppc-fake-numa-easy 2007-12-07 21:25:55.0 +0530 +++ linux-2.6.24-rc4-mm1-balbir/arch/powerpc/mm/numa.c 2007-12-08 03:19:46.0 +0530 @@ -24,6 +24,8 @@ static int numa_enabled = 1; +static char *cmdline __initdata; Can you call this fake_numa_args or something, cmdline is a bit generic. I could if it makes code easier to understand. Will put it in my TODO list. @@ -39,6 +41,43 @@ static bootmem_data_t __initdata plat_no static int min_common_depth; static int n_mem_addr_cells, n_mem_size_cells; +static int __cpuinit fake_numa_create_new_node(unsigned long end_pfn, + unsigned int *nid) +{ + unsigned long long mem; + char *p = cmdline; + static unsigned int fake_nid = 0; + static unsigned long long curr_boundary = 0; + + *nid = fake_nid; As I mentioned in my other email I think this is broken, you unconditionally overwrite *nid, even if no fake numa was specified? Aah.. OK.. looks like a BUG. I'll also respond to your other email. + if (!p) + return 0; + + mem = memparse(p, p); + if (!mem) + return 0; + + if (mem curr_boundary) + return 0; + + curr_boundary = mem; + + if ((end_pfn PAGE_SHIFT) mem) { + /* +* Skip commas and spaces +*/ + while (*p == ',' || *p == ' ' || *p == '\t') + p++; + + cmdline = p; + fake_nid++; + *nid = fake_nid; + dbg(created new fake_node with id %d\n, fake_nid); + return 1; + } + return 0; +} + static void __cpuinit map_cpu_to_node(int cpu, int node) { numa_cpu_lookup_table[cpu] = node; @@ -344,12 +383,14 @@ static void __init parse_drconf_memory(s if (nid == 0x || nid = MAX_NUMNODES) nid = default_nid; } - node_set_online(nid); size = numa_enforce_memory_limit(start, lmb_size); if (!size) continue; + fake_numa_create_new_node(((start + size) PAGE_SHIFT), nid); + node_set_online(nid); I can't convince myself that this is 100% ok, the moving of node_set_online(). At the very least it's a change in behaviour, previously we would online the node regardless of the memory limit. Hmm.. this can be reverted, but do we gain anything by enabling nodes, even though we are over the memory limit? add_active_range(nid, start PAGE_SHIFT, (start PAGE_SHIFT) + (size PAGE_SHIFT)); } @@ -429,7 +470,6 @@ new_range: nid = of_node_to_nid_single(memory); if (nid 0) nid = default_nid; - node_set_online(nid); if (!(size = numa_enforce_memory_limit(start, size))) { if (--ranges) @@ -438,6 +478,9 @@ new_range: continue; } + fake_numa_create_new_node(((start + size) PAGE_SHIFT), nid); + node_set_online(nid); Ditto previous comment. Yes, point noted. Thanks for your review and problem report. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
* Michael Ellerman [EMAIL PROTECTED] [2008-01-18 16:34:53]: On Sat, 2007-12-08 at 04:07 +0530, Balbir Singh wrote: Changelog 1. Get rid of the constant 5 (based on comments from [EMAIL PROTECTED]) 2. Implement suggestions from Olof Johannson 3. Check if cmdline is NULL in fake_numa_create_new_node() Tested with additional parameters from Olof numa=debug,fake= numa=foo,fake=bar I'm not sure why yet, but git bisect tells me it's this patch that's causing the for-2.6.25 tree to explode on boot on cell machines. Hi, Do you boot with numa=options on your machine? Could I have your machine configuration? Any OOPS/log would be helpful. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
On Sat, 2007-12-08 at 04:07 +0530, Balbir Singh wrote: Changelog 1. Get rid of the constant 5 (based on comments from [EMAIL PROTECTED]) 2. Implement suggestions from Olof Johannson 3. Check if cmdline is NULL in fake_numa_create_new_node() Tested with additional parameters from Olof numa=debug,fake= numa=foo,fake=bar I'm not sure why yet, but git bisect tells me it's this patch that's causing the for-2.6.25 tree to explode on boot on cell machines. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
On Fri, 2008-01-18 at 16:34 +1100, Michael Ellerman wrote: On Sat, 2007-12-08 at 04:07 +0530, Balbir Singh wrote: Changelog 1. Get rid of the constant 5 (based on comments from [EMAIL PROTECTED]) 2. Implement suggestions from Olof Johannson 3. Check if cmdline is NULL in fake_numa_create_new_node() Tested with additional parameters from Olof numa=debug,fake= numa=foo,fake=bar I'm not sure why yet, but git bisect tells me it's this patch that's causing the for-2.6.25 tree to explode on boot on cell machines. This fixes it, although I'm a little worried about some of the removals/movings of node_set_online() in the patch. diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 1666e7d..dcedc26 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -49,7 +49,6 @@ static int __cpuinit fake_numa_create_new_node(unsigned long end_pfn, static unsigned int fake_nid = 0; static unsigned long long curr_boundary = 0; - *nid = fake_nid; if (!p) return 0; @@ -60,6 +59,7 @@ static int __cpuinit fake_numa_create_new_node(unsigned long end_pfn, if (mem curr_boundary) return 0; + *nid = fake_nid; curr_boundary = mem; if ((end_pfn PAGE_SHIFT) mem) { signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
On Sat, 2007-12-08 at 04:07 +0530, Balbir Singh wrote: Here's a dumb simple implementation of fake NUMA nodes for PowerPC. Fake NUMA nodes can be specified using the following command line option Comments are as always welcome! Here's some :) diff -puN arch/powerpc/mm/numa.c~ppc-fake-numa-easy arch/powerpc/mm/numa.c --- linux-2.6.24-rc4-mm1/arch/powerpc/mm/numa.c~ppc-fake-numa-easy 2007-12-07 21:25:55.0 +0530 +++ linux-2.6.24-rc4-mm1-balbir/arch/powerpc/mm/numa.c2007-12-08 03:19:46.0 +0530 @@ -24,6 +24,8 @@ static int numa_enabled = 1; +static char *cmdline __initdata; Can you call this fake_numa_args or something, cmdline is a bit generic. @@ -39,6 +41,43 @@ static bootmem_data_t __initdata plat_no static int min_common_depth; static int n_mem_addr_cells, n_mem_size_cells; +static int __cpuinit fake_numa_create_new_node(unsigned long end_pfn, + unsigned int *nid) +{ + unsigned long long mem; + char *p = cmdline; + static unsigned int fake_nid = 0; + static unsigned long long curr_boundary = 0; + + *nid = fake_nid; As I mentioned in my other email I think this is broken, you unconditionally overwrite *nid, even if no fake numa was specified? + if (!p) + return 0; + + mem = memparse(p, p); + if (!mem) + return 0; + + if (mem curr_boundary) + return 0; + + curr_boundary = mem; + + if ((end_pfn PAGE_SHIFT) mem) { + /* + * Skip commas and spaces + */ + while (*p == ',' || *p == ' ' || *p == '\t') + p++; + + cmdline = p; + fake_nid++; + *nid = fake_nid; + dbg(created new fake_node with id %d\n, fake_nid); + return 1; + } + return 0; +} + static void __cpuinit map_cpu_to_node(int cpu, int node) { numa_cpu_lookup_table[cpu] = node; @@ -344,12 +383,14 @@ static void __init parse_drconf_memory(s if (nid == 0x || nid = MAX_NUMNODES) nid = default_nid; } - node_set_online(nid); size = numa_enforce_memory_limit(start, lmb_size); if (!size) continue; + fake_numa_create_new_node(((start + size) PAGE_SHIFT), nid); + node_set_online(nid); I can't convince myself that this is 100% ok, the moving of node_set_online(). At the very least it's a change in behaviour, previously we would online the node regardless of the memory limit. add_active_range(nid, start PAGE_SHIFT, (start PAGE_SHIFT) + (size PAGE_SHIFT)); } @@ -429,7 +470,6 @@ new_range: nid = of_node_to_nid_single(memory); if (nid 0) nid = default_nid; - node_set_online(nid); if (!(size = numa_enforce_memory_limit(start, size))) { if (--ranges) @@ -438,6 +478,9 @@ new_range: continue; } + fake_numa_create_new_node(((start + size) PAGE_SHIFT), nid); + node_set_online(nid); Ditto previous comment. cheers -- Michael Ellerman OzLabs, IBM Australia Development Lab wwweb: http://michael.ellerman.id.au phone: +61 2 6212 1183 (tie line 70 21183) We do not inherit the earth from our ancestors, we borrow it from our children. - S.M.A.R.T Person signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
Balbir Singh wrote: Changelog 1. Get rid of the constant 5 (based on comments from [EMAIL PROTECTED]) 2. Implement suggestions from Olof Johannson 3. Check if cmdline is NULL in fake_numa_create_new_node() Tested with additional parameters from Olof numa=debug,fake= numa=foo,fake=bar Here's a dumb simple implementation of fake NUMA nodes for PowerPC. Fake NUMA nodes can be specified using the following command line option numa=fake=node range node range is of the format range1,range2,...rangeN Each of the rangeX parameters is passed using memparse(). I find the patch useful for fake NUMA emulation on my simple PowerPC machine. I've tested it on a non-numa box with the following arguments numa=fake=1G numa=fake=1G,2G name=fake=1G,512M,2G numa=fake=1500M,2800M mem=3500M numa=fake=1G mem=512M numa=fake=1G mem=1G This patch applies on top of 2.6.24-rc4. All though I've tried my best to handle some of the architecture specific details of PowerPC, I might have overlooked something obvious, like the usage of an API or some architecture tweaks. The patch depends on CONFIG_NUMA and I decided against creating a separate config option for fake NUMA to keep the code simple. Comments are as always welcome! Signed-off-by: Balbir Singh [EMAIL PROTECTED] --- arch/powerpc/mm/numa.c | 59 - 1 file changed, 54 insertions(+), 5 deletions(-) diff -puN arch/powerpc/mm/numa.c~ppc-fake-numa-easy arch/powerpc/mm/numa.c --- linux-2.6.24-rc4-mm1/arch/powerpc/mm/numa.c~ppc-fake-numa-easy 2007-12-07 21:25:55.0 +0530 +++ linux-2.6.24-rc4-mm1-balbir/arch/powerpc/mm/numa.c2007-12-08 03:19:46.0 +0530 @@ -24,6 +24,8 @@ static int numa_enabled = 1; +static char *cmdline __initdata; + static int numa_debug; #define dbg(args...) if (numa_debug) { printk(KERN_INFO args); } @@ -39,6 +41,43 @@ static bootmem_data_t __initdata plat_no static int min_common_depth; static int n_mem_addr_cells, n_mem_size_cells; +static int __cpuinit fake_numa_create_new_node(unsigned long end_pfn, + unsigned int *nid) +{ + unsigned long long mem; + char *p = cmdline; + static unsigned int fake_nid = 0; + static unsigned long long curr_boundary = 0; + + *nid = fake_nid; + if (!p) + return 0; + + mem = memparse(p, p); + if (!mem) + return 0; + + if (mem curr_boundary) + return 0; + + curr_boundary = mem; + + if ((end_pfn PAGE_SHIFT) mem) { + /* + * Skip commas and spaces + */ + while (*p == ',' || *p == ' ' || *p == '\t') + p++; + + cmdline = p; + fake_nid++; + *nid = fake_nid; + dbg(created new fake_node with id %d\n, fake_nid); + return 1; + } + return 0; +} + static void __cpuinit map_cpu_to_node(int cpu, int node) { numa_cpu_lookup_table[cpu] = node; @@ -344,12 +383,14 @@ static void __init parse_drconf_memory(s if (nid == 0x || nid = MAX_NUMNODES) nid = default_nid; } - node_set_online(nid); size = numa_enforce_memory_limit(start, lmb_size); if (!size) continue; + fake_numa_create_new_node(((start + size) PAGE_SHIFT), nid); + node_set_online(nid); + add_active_range(nid, start PAGE_SHIFT, (start PAGE_SHIFT) + (size PAGE_SHIFT)); } @@ -429,7 +470,6 @@ new_range: nid = of_node_to_nid_single(memory); if (nid 0) nid = default_nid; - node_set_online(nid); if (!(size = numa_enforce_memory_limit(start, size))) { if (--ranges) @@ -438,6 +478,9 @@ new_range: continue; } + fake_numa_create_new_node(((start + size) PAGE_SHIFT), nid); + node_set_online(nid); + add_active_range(nid, start PAGE_SHIFT, (start PAGE_SHIFT) + (size PAGE_SHIFT)); @@ -461,7 +504,7 @@ static void __init setup_nonnuma(void) unsigned long top_of_ram = lmb_end_of_DRAM(); unsigned long total_ram = lmb_phys_mem_size(); unsigned long start_pfn, end_pfn; - unsigned int i; + unsigned int i, nid = 0; printk(KERN_DEBUG Top of RAM: 0x%lx, Total RAM: 0x%lx\n, top_of_ram, total_ram); @@ -471,9 +514,11 @@ static void __init setup_nonnuma(void) for (i = 0; i lmb.memory.cnt; ++i) { start_pfn = lmb.memory.region[i].base PAGE_SHIFT; end_pfn = start_pfn +
Re: [PATCH] Fake NUMA emulation for PowerPC (Take 2)
On Sat, Dec 08, 2007 at 04:07:14AM +0530, Balbir Singh wrote: Signed-off-by: Balbir Singh [EMAIL PROTECTED] Looks good to me. Sure, it could be fleshed out to something more generic and in common code, but this is small and simple and doesn't bloat the kernel much as it stands, and it has value for debugging. Acked-by: Olof Johansson [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
On Sat 2007-12-08 09:52:06, Balbir Singh wrote: David Rientjes wrote: On Sat, 8 Dec 2007, Balbir Singh wrote: To be able to test the memory controller under NUMA, I use fake NUMA nodes. x86-64 has a similar feature, the code I have here is the simplest I could come up with for PowerPC. Magnus Damm had patches from over a year ago that, I believe, made much of the x86_64 fake NUMA code generic so that it could be extended for architectures such as i386. Perhaps he could resurrect those patches if there is wider interest in such a tool. That would be a very interesting patch, but what I have here is the simplest patch and we could build on it incrementally. The interface is non-standard but it does amazing things for 59 lines of code change. Well, maybe it is amazing, but having non-standard interface is also wrong... -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
Pavel Machek wrote: On Sat 2007-12-08 09:52:06, Balbir Singh wrote: David Rientjes wrote: On Sat, 8 Dec 2007, Balbir Singh wrote: To be able to test the memory controller under NUMA, I use fake NUMA nodes. x86-64 has a similar feature, the code I have here is the simplest I could come up with for PowerPC. Magnus Damm had patches from over a year ago that, I believe, made much of the x86_64 fake NUMA code generic so that it could be extended for architectures such as i386. Perhaps he could resurrect those patches if there is wider interest in such a tool. That would be a very interesting patch, but what I have here is the simplest patch and we could build on it incrementally. The interface is non-standard but it does amazing things for 59 lines of code change. Well, maybe it is amazing, but having non-standard interface is also wrong... I tend to agree with you, but in this case it's mostly debug infrastructure that is architecture specific. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
On Sat, 8 Dec 2007, Balbir Singh wrote: You're going to want to distribute the cpu's based on how they match up physically with the actual platform that you're running on. x86_64 does Could you explain this better, how does it match up CPU's with fake NUMA memory? Is there some smartness there? I'll try and look at the code and also see what I can do for PowerPC numa_cpumask_lookup_table[] would return the correct cpumask for the fake node index. Then all the code that uses node_to_cpumask() in generic kernel code like the scheduler and VM still preserve their true NUMA affinity that matches the underlying hardware. I tried to make x86_64 fake NUMA as close to the real thing as possible. You also probably want to make all you changes dependent on CONFIG_NUMA_EMU like the x86_64 case. That'll probably be helpful as you extend this tool more and more. David ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
David Rientjes wrote: On Sat, 8 Dec 2007, Balbir Singh wrote: Yes, they all appear on node 0. We could have tweaks to distribute CPU's as well. You're going to want to distribute the cpu's based on how they match up physically with the actual platform that you're running on. x86_64 does Could you explain this better, how does it match up CPU's with fake NUMA memory? Is there some smartness there? I'll try and look at the code and also see what I can do for PowerPC this already and it makes fake NUMA more useful because it matches the real-life case more often. Yes, I agree, but I don't want that to be the first step for fake NUMA nodes on PowerPC. I think we can incrementally add features. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
David Rientjes wrote: On Sat, 8 Dec 2007, Balbir Singh wrote: To be able to test the memory controller under NUMA, I use fake NUMA nodes. x86-64 has a similar feature, the code I have here is the simplest I could come up with for PowerPC. Magnus Damm had patches from over a year ago that, I believe, made much of the x86_64 fake NUMA code generic so that it could be extended for architectures such as i386. Perhaps he could resurrect those patches if there is wider interest in such a tool. That would be a very interesting patch, but what I have here is the simplest patch and we could build on it incrementally. The interface is non-standard but it does amazing things for 59 lines of code change. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
Nathan Lynch wrote: Hi Balbir- Balbir Singh wrote: Here's a dumb simple implementation of fake NUMA nodes for PowerPC. Fake NUMA nodes can be specified using the following command line option numa=fake=node range node range is of the format range1,range2,...rangeN Each of the rangeX parameters is passed using memparse(). I find the patch useful for fake NUMA emulation on my simple PowerPC machine. I've tested it on a non-numa box with the following arguments numa=fake=1G numa=fake=1G,2G name=fake=1G,512M,2G numa=fake=1500M,2800M mem=3500M numa=fake=1G mem=512M numa=fake=1G mem=1G So this doesn't appear to allow one to assign cpus to fake nodes? Do all cpus just get assigned to node 0 with numa=fake? Yes, they all appear on node 0. We could have tweaks to distribute CPU's as well. A different approach that occurs to me is to use kexec with a doctored device tree (i.e. with the ibm,associativity properties modified to reflect your desired topology). Perhaps a little bit obscure, but it seems more flexible. That would be interesting, but it always means that we need to run kexec, which might involve two boots. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
On Dec 7, 2007, at 4:12 PM, Balbir Singh wrote: Kumar Gala wrote: On Dec 7, 2007, at 3:35 PM, Balbir Singh wrote: Olof Johansson wrote: Hi, On Sat, Dec 08, 2007 at 02:44:25AM +0530, Balbir Singh wrote: Comments are as always welcome! Care to explain what this is useful for? (Not saying it's a stupid idea, just wondering what the reason for doing it is). In my case, I use it to test parts of my memory controller patches on an emulated NUMA machine. I plan to use it to test out page migration across nodes. Can you explain that further. I'm still not clear on why this is useful. - k Sure. In my case I need to emulate NUMA nodes to do some NUMA specific testing. The memory controller I've written has some interesting data structures like per node, per zone LRU lists. To be able to test those features on a non-numa box is a problem, since we get just the default node. Maybe I'm missing something, what do you mean by memory controller you've written? (I'm use to the term 'memory controller' meaning the actual RAM control). To be able to test the memory controller under NUMA, I use fake NUMA nodes. x86-64 has a similar feature, the code I have here is the simplest I could come up with for PowerPC. I just thought of another very interesting use case, it can be used to split up the zone's lru lock which is highly contended. - k ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
Arnd Bergmann wrote: On Friday 07 December 2007, Balbir Singh wrote: Balbir Singh wrote: Geert Uytterhoeven wrote: On Sat, 8 Dec 2007, Balbir Singh wrote: + if (strstr(p, fake=)) + cmdline = p + 5;/* 5 is faster than strlen(fake=) */ Really? My gcc is smart enough to replace the `strlen(fake=)' by 5, even without -O. Thanks for pointing that out, but I am surprised that a compiler would interpret library routines like strlen. I just tested it and it turns out that you are right. I'll go hunt to see where gcc gets its magic powers from. Even if it wasn't: Why the heck would you want to optimize this? The function is run _once_ at boot time and the object code gets thrown away afterwards! Arnd Cause, I see no downside of doing it. The strlen of fake= is fixed. But having said that, I am not a purist about the approach, I just want cmdline to point after fake= -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
On Friday 07 December 2007, Balbir Singh wrote: Balbir Singh wrote: Geert Uytterhoeven wrote: On Sat, 8 Dec 2007, Balbir Singh wrote: + if (strstr(p, fake=)) + cmdline = p + 5;/* 5 is faster than strlen(fake=) */ Really? My gcc is smart enough to replace the `strlen(fake=)' by 5, even without -O. Thanks for pointing that out, but I am surprised that a compiler would interpret library routines like strlen. I just tested it and it turns out that you are right. I'll go hunt to see where gcc gets its magic powers from. Even if it wasn't: Why the heck would you want to optimize this? The function is run _once_ at boot time and the object code gets thrown away afterwards! Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
On Dec 7, 2007, at 3:35 PM, Balbir Singh wrote: Olof Johansson wrote: Hi, On Sat, Dec 08, 2007 at 02:44:25AM +0530, Balbir Singh wrote: Comments are as always welcome! Care to explain what this is useful for? (Not saying it's a stupid idea, just wondering what the reason for doing it is). In my case, I use it to test parts of my memory controller patches on an emulated NUMA machine. I plan to use it to test out page migration across nodes. Can you explain that further. I'm still not clear on why this is useful. - k ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
Balbir Singh wrote: Geert Uytterhoeven wrote: On Sat, 8 Dec 2007, Balbir Singh wrote: + if (strstr(p, fake=)) + cmdline = p + 5;/* 5 is faster than strlen(fake=) */ Really? My gcc is smart enough to replace the `strlen(fake=)' by 5, even without -O. Thanks for pointing that out, but I am surprised that a compiler would interpret library routines like strlen. I just tested it and it turns out that you are right. I'll go hunt to see where gcc gets its magic powers from. With kind regards, Geert Uytterhoeven Software Architect ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
Hi, On Sat, Dec 08, 2007 at 02:44:25AM +0530, Balbir Singh wrote: Comments are as always welcome! Care to explain what this is useful for? (Not saying it's a stupid idea, just wondering what the reason for doing it is). diff -puN arch/powerpc/mm/numa.c~ppc-fake-numa-easy arch/powerpc/mm/numa.c --- linux-2.6.24-rc4-mm1/arch/powerpc/mm/numa.c~ppc-fake-numa-easy 2007-12-07 21:25:55.0 +0530 +++ linux-2.6.24-rc4-mm1-balbir/arch/powerpc/mm/numa.c2007-12-08 02:36:02.0 +0530 @@ -24,6 +24,8 @@ static int numa_enabled = 1; +char *cmdline __initdata; + Looks like this should be static. @@ -702,6 +744,9 @@ static int __init early_numa(char *p) if (strstr(p, debug)) numa_debug = 1; + if (strstr(p, fake=)) + cmdline = p + 5;/* 5 is faster than strlen(fake=) */ This doesn't look right. You check if it contains fake=, not if it starts with it. So if someone did: numa=foo,fake=bar, or even numa=debug,fake=, things wouldn't work right. -Olof ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
On Sat, 8 Dec 2007, Balbir Singh wrote: Yes, they all appear on node 0. We could have tweaks to distribute CPU's as well. You're going to want to distribute the cpu's based on how they match up physically with the actual platform that you're running on. x86_64 does this already and it makes fake NUMA more useful because it matches the real-life case more often. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
Arnd Bergmann wrote: On Friday 07 December 2007, Balbir Singh wrote: Here's a dumb simple implementation of fake NUMA nodes for PowerPC. Fake NUMA nodes can be specified using the following command line option numa=fake=node range node range is of the format range1,range2,...rangeN Excellent idea! I'd love to have this in RHEL5u1, because that would make that distro boot on certain machines that have more memory than is supported without an iommu driver. The problem we have is that when you simply say mem=1G but all of the first gigabyte is on the first node, you end up with a memoryless node, which is not supported. Unfortunately, it comes too late for me now, as all new distros already boot on Cell machines that need an IOMMU. Very interesting use case! I am sure there are others were fake NUMA nodes can be applied. I just listed one other in another email, apart from using it for playing around with NUMA like machines. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
Kumar Gala wrote: On Dec 7, 2007, at 4:12 PM, Balbir Singh wrote: Kumar Gala wrote: On Dec 7, 2007, at 3:35 PM, Balbir Singh wrote: Olof Johansson wrote: Hi, On Sat, Dec 08, 2007 at 02:44:25AM +0530, Balbir Singh wrote: Comments are as always welcome! Care to explain what this is useful for? (Not saying it's a stupid idea, just wondering what the reason for doing it is). In my case, I use it to test parts of my memory controller patches on an emulated NUMA machine. I plan to use it to test out page migration across nodes. Can you explain that further. I'm still not clear on why this is useful. - k Sure. In my case I need to emulate NUMA nodes to do some NUMA specific testing. The memory controller I've written has some interesting data structures like per node, per zone LRU lists. To be able to test those features on a non-numa box is a problem, since we get just the default node. Maybe I'm missing something, what do you mean by memory controller you've written? (I'm use to the term 'memory controller' meaning the actual RAM control). Ah! that explains the disconnect. If you look at the latest -mm tree. We have a memory controller under control groups, we use it to control how much memory a group of process can access at a time. To be able to test the memory controller under NUMA, I use fake NUMA nodes. x86-64 has a similar feature, the code I have here is the simplest I could come up with for PowerPC. I just thought of another very interesting use case, it can be used to split up the zone's lru lock which is highly contended. - k -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
Geert Uytterhoeven wrote: On Sat, 8 Dec 2007, Balbir Singh wrote: +if (strstr(p, fake=)) +cmdline = p + 5;/* 5 is faster than strlen(fake=) */ Really? My gcc is smart enough to replace the `strlen(fake=)' by 5, even without -O. Thanks for pointing that out, but I am surprised that a compiler would interpret library routines like strlen. With kind regards, Geert Uytterhoeven Software Architect -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
Olof Johansson wrote: Hi, On Sat, Dec 08, 2007 at 02:44:25AM +0530, Balbir Singh wrote: Comments are as always welcome! Care to explain what this is useful for? (Not saying it's a stupid idea, just wondering what the reason for doing it is). In my case, I use it to test parts of my memory controller patches on an emulated NUMA machine. I plan to use it to test out page migration across nodes. diff -puN arch/powerpc/mm/numa.c~ppc-fake-numa-easy arch/powerpc/mm/numa.c --- linux-2.6.24-rc4-mm1/arch/powerpc/mm/numa.c~ppc-fake-numa-easy 2007-12-07 21:25:55.0 +0530 +++ linux-2.6.24-rc4-mm1-balbir/arch/powerpc/mm/numa.c 2007-12-08 02:36:02.0 +0530 @@ -24,6 +24,8 @@ static int numa_enabled = 1; +char *cmdline __initdata; + Looks like this should be static. Yes, good catch! @@ -702,6 +744,9 @@ static int __init early_numa(char *p) if (strstr(p, debug)) numa_debug = 1; +if (strstr(p, fake=)) +cmdline = p + 5;/* 5 is faster than strlen(fake=) */ This doesn't look right. You check if it contains fake=, not if it starts with it. So if someone did: numa=foo,fake=bar, or even numa=debug,fake=, things wouldn't work right. Yes, you are right. I merely followed the strstr convention already present, which as you righly point out is wrong. I suspect I need to do something like p = strstr(p, fake=) if (p) cmdline = p + 5; This would still allow us to do things like numa=foo,fake=bar but the memparse() utility would fail at fake=bar ^^^ or even numa=debug,fake=1G I suspect that this should be good enough for a command line option. -Olof -- Thanks, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
On Sat, 8 Dec 2007, Balbir Singh wrote: To be able to test the memory controller under NUMA, I use fake NUMA nodes. x86-64 has a similar feature, the code I have here is the simplest I could come up with for PowerPC. Magnus Damm had patches from over a year ago that, I believe, made much of the x86_64 fake NUMA code generic so that it could be extended for architectures such as i386. Perhaps he could resurrect those patches if there is wider interest in such a tool. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
Kumar Gala wrote: On Dec 7, 2007, at 3:35 PM, Balbir Singh wrote: Olof Johansson wrote: Hi, On Sat, Dec 08, 2007 at 02:44:25AM +0530, Balbir Singh wrote: Comments are as always welcome! Care to explain what this is useful for? (Not saying it's a stupid idea, just wondering what the reason for doing it is). In my case, I use it to test parts of my memory controller patches on an emulated NUMA machine. I plan to use it to test out page migration across nodes. Can you explain that further. I'm still not clear on why this is useful. - k Sure. In my case I need to emulate NUMA nodes to do some NUMA specific testing. The memory controller I've written has some interesting data structures like per node, per zone LRU lists. To be able to test those features on a non-numa box is a problem, since we get just the default node. To be able to test the memory controller under NUMA, I use fake NUMA nodes. x86-64 has a similar feature, the code I have here is the simplest I could come up with for PowerPC. I just thought of another very interesting use case, it can be used to split up the zone's lru lock which is highly contended. -- Warm Regards, Balbir Singh Linux Technology Center IBM, ISTL ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
On Friday 07 December 2007, Balbir Singh wrote: Here's a dumb simple implementation of fake NUMA nodes for PowerPC. Fake NUMA nodes can be specified using the following command line option numa=fake=node range node range is of the format range1,range2,...rangeN Excellent idea! I'd love to have this in RHEL5u1, because that would make that distro boot on certain machines that have more memory than is supported without an iommu driver. The problem we have is that when you simply say mem=1G but all of the first gigabyte is on the first node, you end up with a memoryless node, which is not supported. Unfortunately, it comes too late for me now, as all new distros already boot on Cell machines that need an IOMMU. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
On Fri, 7 Dec 2007, Olof Johansson wrote: Comments are as always welcome! Care to explain what this is useful for? (Not saying it's a stupid idea, just wondering what the reason for doing it is). Fake NUMA has always been useful for testing NUMA code without having to have a wide range of hardware available to you. It's a clever tool on x86_64 intended for kernel developers that simply makes it easier to test code and adds an increased level of robustness to the kernel. I think it's a valuable addition. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH] Fake NUMA emulation for PowerPC
On Sat, 8 Dec 2007, Balbir Singh wrote: + if (strstr(p, fake=)) + cmdline = p + 5;/* 5 is faster than strlen(fake=) */ Really? My gcc is smart enough to replace the `strlen(fake=)' by 5, even without -O. With kind regards, Geert Uytterhoeven Software Architect Sony Network and Software Technology Center Europe The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium Phone:+32 (0)2 700 8453 Fax: +32 (0)2 700 8622 E-mail: [EMAIL PROTECTED] Internet: http://www.sony-europe.com/ Sony Network and Software Technology Center Europe A division of Sony Service Centre (Europe) N.V. Registered office: Technologielaan 7 · B-1840 Londerzeel · Belgium VAT BE 0413.825.160 · RPR Brussels Fortis Bank Zaventem · Swift GEBABEBB08A · IBAN BE39001382358619___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev