Re: [PATCH] sparsemem/bootmem: catch greater than section size allocations
On Tue, Feb 28, 2012 at 12:11:51PM -0800, Nishanth Aravamudan wrote: On 28.02.2012 [14:53:26 +0100], Johannes Weiner wrote: On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote: While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory Overcommit) on powerpc, we tripped the following: kernel BUG at mm/bootmem.c:483! cpu 0x0: Vector: 700 (Program Check) at [c0c03940] pc: c0a62bd8: .alloc_bootmem_core+0x90/0x39c lr: c0a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c sp: c0c03bc0 msr: 80021032 current = 0xc0b0cce0 paca= 0xc1d8 pid = 0, comm = swapper kernel BUG at mm/bootmem.c:483! enter ? for help [c0c03c80] c0a64bcc .sparse_early_usemaps_alloc_node+0x84/0x29c [c0c03d50] c0a64f10 .sparse_init+0x12c/0x28c [c0c03e20] c0a474f4 .setup_arch+0x20c/0x294 [c0c03ee0] c0a4079c .start_kernel+0xb4/0x460 [c0c03f90] c0009670 .start_here_common+0x1c/0x2c This is BUG_ON(limit goal + size limit); and after some debugging, it seems that goal = 0x700 limit = 0x800 and sparse_early_usemaps_alloc_node - sparse_early_usemaps_alloc_pgdat_section - alloc_bootmem_section calls return alloc_bootmem_section(usemap_size() * count, section_nr); This is on a system with 8TB available via the AMS pool, and as a quirk of AMS in firmware, all of that memory shows up in node 0. So, we end up with an allocation that will fail the goal/limit constraints. In theory, we could fall-back to alloc_bootmem_node() in sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE defined, we'll BUG_ON() instead. A simple solution appears to be to disable the limit check if the size of the allocation in alloc_bootmem_secition exceeds the section size. It makes sense to allow the usemaps to spill over to subsequent sections instead of panicking, so FWIW: Acked-by: Johannes Weiner han...@cmpxchg.org That being said, it would be good if check_usemap_section_nr() printed the cross-dependencies between pgdats and sections when the usemaps of a node spilled over to other sections than the ones holding the pgdat. How about this? --- From: Johannes Weiner han...@cmpxchg.org Subject: sparsemem/bootmem: catch greater than section size allocations fix If alloc_bootmem_section() no longer guarantees section-locality, we need check_usemap_section_nr() to print possible cross-dependencies between node descriptors and the usemaps allocated through it. Signed-off-by: Johannes Weiner han...@cmpxchg.org --- diff --git a/mm/sparse.c b/mm/sparse.c index 61d7cde..9e032dc 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -359,6 +359,7 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map, continue; usemap_map[pnum] = usemap; usemap += size; + check_usemap_section_nr(nodeid, usemap_map[pnum]); } return; } This makes sense to me -- ok if I fold it into the re-worked patch (based upon Mel's comments)? Sure thing! Furthermore, I wonder if we can remove the sparse-specific stuff from bootmem.c as well, as now even more so than before, calculating the desired area is really none of bootmem's business. Would something like this be okay? --- From: Johannes Weiner han...@cmpxchg.org Subject: [patch] mm: remove sparsemem allocation details from the bootmem allocator alloc_bootmem_section() derives allocation area constraints from the specified sparsemem section. This is a bit specific for a generic memory allocator like bootmem, though, so move it over to sparsemem. Since __alloc_bootmem_node() already retries failed allocations with relaxed area constraints, the fallback code in sparsemem.c can be removed and the code becomes a bit more compact overall. Signed-off-by: Johannes Weiner han...@cmpxchg.org I've not tested it, but the intention seems sensible. I think it should remain a separate change. Yes, I agree. I'll resend it in a bit as stand-alone patch. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] sparsemem/bootmem: catch greater than section size allocations
On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote: While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory Overcommit) on powerpc, we tripped the following: kernel BUG at mm/bootmem.c:483! cpu 0x0: Vector: 700 (Program Check) at [c0c03940] pc: c0a62bd8: .alloc_bootmem_core+0x90/0x39c lr: c0a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c sp: c0c03bc0 msr: 80021032 current = 0xc0b0cce0 paca= 0xc1d8 pid = 0, comm = swapper kernel BUG at mm/bootmem.c:483! enter ? for help [c0c03c80] c0a64bcc .sparse_early_usemaps_alloc_node+0x84/0x29c [c0c03d50] c0a64f10 .sparse_init+0x12c/0x28c [c0c03e20] c0a474f4 .setup_arch+0x20c/0x294 [c0c03ee0] c0a4079c .start_kernel+0xb4/0x460 [c0c03f90] c0009670 .start_here_common+0x1c/0x2c This is BUG_ON(limit goal + size limit); and after some debugging, it seems that goal = 0x700 limit = 0x800 and sparse_early_usemaps_alloc_node - sparse_early_usemaps_alloc_pgdat_section - alloc_bootmem_section calls return alloc_bootmem_section(usemap_size() * count, section_nr); This is on a system with 8TB available via the AMS pool, and as a quirk of AMS in firmware, all of that memory shows up in node 0. So, we end up with an allocation that will fail the goal/limit constraints. In theory, we could fall-back to alloc_bootmem_node() in sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE defined, we'll BUG_ON() instead. A simple solution appears to be to disable the limit check if the size of the allocation in alloc_bootmem_secition exceeds the section size. It makes sense to allow the usemaps to spill over to subsequent sections instead of panicking, so FWIW: Acked-by: Johannes Weiner han...@cmpxchg.org That being said, it would be good if check_usemap_section_nr() printed the cross-dependencies between pgdats and sections when the usemaps of a node spilled over to other sections than the ones holding the pgdat. How about this? --- From: Johannes Weiner han...@cmpxchg.org Subject: sparsemem/bootmem: catch greater than section size allocations fix If alloc_bootmem_section() no longer guarantees section-locality, we need check_usemap_section_nr() to print possible cross-dependencies between node descriptors and the usemaps allocated through it. Signed-off-by: Johannes Weiner han...@cmpxchg.org --- diff --git a/mm/sparse.c b/mm/sparse.c index 61d7cde..9e032dc 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -359,6 +359,7 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map, continue; usemap_map[pnum] = usemap; usemap += size; + check_usemap_section_nr(nodeid, usemap_map[pnum]); } return; } --- Furthermore, I wonder if we can remove the sparse-specific stuff from bootmem.c as well, as now even more so than before, calculating the desired area is really none of bootmem's business. Would something like this be okay? --- From: Johannes Weiner han...@cmpxchg.org Subject: [patch] mm: remove sparsemem allocation details from the bootmem allocator alloc_bootmem_section() derives allocation area constraints from the specified sparsemem section. This is a bit specific for a generic memory allocator like bootmem, though, so move it over to sparsemem. Since __alloc_bootmem_node() already retries failed allocations with relaxed area constraints, the fallback code in sparsemem.c can be removed and the code becomes a bit more compact overall. Signed-off-by: Johannes Weiner han...@cmpxchg.org --- include/linux/bootmem.h |3 --- mm/bootmem.c| 26 -- mm/sparse.c | 29 + 3 files changed, 9 insertions(+), 49 deletions(-) diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index ab344a5..001c248 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -135,9 +135,6 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat, extern int reserve_bootmem_generic(unsigned long addr, unsigned long size, int flags); -extern void *alloc_bootmem_section(unsigned long size, - unsigned long section_nr); - #ifdef CONFIG_HAVE_ARCH_ALLOC_REMAP extern void *alloc_remap(int nid, unsigned long size); #else diff --git a/mm/bootmem.c b/mm/bootmem.c index 7bc0557..d34026c 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -756,32 +756,6 @@ void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size, } -#ifdef CONFIG_SPARSEMEM -/** - * alloc_bootmem_section - allocate boot memory from a specific section - * @size: size of the request in bytes - *
Re: [PATCH] sparsemem/bootmem: catch greater than section size allocations
On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote: While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory Overcommit) on powerpc, we tripped the following: kernel BUG at mm/bootmem.c:483! cpu 0x0: Vector: 700 (Program Check) at [c0c03940] pc: c0a62bd8: .alloc_bootmem_core+0x90/0x39c lr: c0a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c sp: c0c03bc0 msr: 80021032 current = 0xc0b0cce0 paca= 0xc1d8 pid = 0, comm = swapper kernel BUG at mm/bootmem.c:483! enter ? for help [c0c03c80] c0a64bcc .sparse_early_usemaps_alloc_node+0x84/0x29c [c0c03d50] c0a64f10 .sparse_init+0x12c/0x28c [c0c03e20] c0a474f4 .setup_arch+0x20c/0x294 [c0c03ee0] c0a4079c .start_kernel+0xb4/0x460 [c0c03f90] c0009670 .start_here_common+0x1c/0x2c This is BUG_ON(limit goal + size limit); and after some debugging, it seems that goal = 0x700 limit = 0x800 and sparse_early_usemaps_alloc_node - sparse_early_usemaps_alloc_pgdat_section - alloc_bootmem_section calls return alloc_bootmem_section(usemap_size() * count, section_nr); This is on a system with 8TB available via the AMS pool, and as a quirk of AMS in firmware, all of that memory shows up in node 0. So, we end up with an allocation that will fail the goal/limit constraints. In theory, we could fall-back to alloc_bootmem_node() in sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE defined, we'll BUG_ON() instead. A simple solution appears to be to disable the limit check if the size of the allocation in alloc_bootmem_secition exceeds the section size. Signed-off-by: Nishanth Aravamudan n...@us.ibm.com Cc: Dave Hansen haveb...@us.ibm.com Cc: Anton Blanchard an...@au1.ibm.com Cc: Paul Mackerras pau...@samba.org Cc: Ben Herrenschmidt b...@kernel.crashing.org Cc: Robert Jennings r...@linux.vnet.ibm.com Cc: linux...@kvack.org Cc: linuxppc-dev@lists.ozlabs.org --- include/linux/mmzone.h |2 ++ mm/bootmem.c |5 - 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 650ba2f..4176834 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn) * PA_SECTION_SHIFT physical address to/from section number * PFN_SECTION_SHIFT pfn to/from section number */ +#define BYTES_PER_SECTION(1UL SECTION_SIZE_BITS) + #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) #define PA_SECTION_SHIFT (SECTION_SIZE_BITS) diff --git a/mm/bootmem.c b/mm/bootmem.c index 668e94d..5cbbc76 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long size, pfn = section_nr_to_pfn(section_nr); goal = pfn PAGE_SHIFT; - limit = section_nr_to_pfn(section_nr + 1) PAGE_SHIFT; + if (size BYTES_PER_SECTION) + limit = 0; + else + limit = section_nr_to_pfn(section_nr + 1) PAGE_SHIFT; As it's ok to spill the allocation over to an adjacent section, why not just make limit==0 unconditionally. That would avoid defining BYTES_PER_SECTION. -- Mel Gorman SUSE Labs ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] sparsemem/bootmem: catch greater than section size allocations
On 28.02.2012 [14:53:26 +0100], Johannes Weiner wrote: On Fri, Feb 24, 2012 at 11:33:58AM -0800, Nishanth Aravamudan wrote: While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory Overcommit) on powerpc, we tripped the following: kernel BUG at mm/bootmem.c:483! cpu 0x0: Vector: 700 (Program Check) at [c0c03940] pc: c0a62bd8: .alloc_bootmem_core+0x90/0x39c lr: c0a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c sp: c0c03bc0 msr: 80021032 current = 0xc0b0cce0 paca= 0xc1d8 pid = 0, comm = swapper kernel BUG at mm/bootmem.c:483! enter ? for help [c0c03c80] c0a64bcc .sparse_early_usemaps_alloc_node+0x84/0x29c [c0c03d50] c0a64f10 .sparse_init+0x12c/0x28c [c0c03e20] c0a474f4 .setup_arch+0x20c/0x294 [c0c03ee0] c0a4079c .start_kernel+0xb4/0x460 [c0c03f90] c0009670 .start_here_common+0x1c/0x2c This is BUG_ON(limit goal + size limit); and after some debugging, it seems that goal = 0x700 limit = 0x800 and sparse_early_usemaps_alloc_node - sparse_early_usemaps_alloc_pgdat_section - alloc_bootmem_section calls return alloc_bootmem_section(usemap_size() * count, section_nr); This is on a system with 8TB available via the AMS pool, and as a quirk of AMS in firmware, all of that memory shows up in node 0. So, we end up with an allocation that will fail the goal/limit constraints. In theory, we could fall-back to alloc_bootmem_node() in sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE defined, we'll BUG_ON() instead. A simple solution appears to be to disable the limit check if the size of the allocation in alloc_bootmem_secition exceeds the section size. It makes sense to allow the usemaps to spill over to subsequent sections instead of panicking, so FWIW: Acked-by: Johannes Weiner han...@cmpxchg.org That being said, it would be good if check_usemap_section_nr() printed the cross-dependencies between pgdats and sections when the usemaps of a node spilled over to other sections than the ones holding the pgdat. How about this? --- From: Johannes Weiner han...@cmpxchg.org Subject: sparsemem/bootmem: catch greater than section size allocations fix If alloc_bootmem_section() no longer guarantees section-locality, we need check_usemap_section_nr() to print possible cross-dependencies between node descriptors and the usemaps allocated through it. Signed-off-by: Johannes Weiner han...@cmpxchg.org --- diff --git a/mm/sparse.c b/mm/sparse.c index 61d7cde..9e032dc 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -359,6 +359,7 @@ static void __init sparse_early_usemaps_alloc_node(unsigned long**usemap_map, continue; usemap_map[pnum] = usemap; usemap += size; + check_usemap_section_nr(nodeid, usemap_map[pnum]); } return; } This makes sense to me -- ok if I fold it into the re-worked patch (based upon Mel's comments)? --- Furthermore, I wonder if we can remove the sparse-specific stuff from bootmem.c as well, as now even more so than before, calculating the desired area is really none of bootmem's business. Would something like this be okay? --- From: Johannes Weiner han...@cmpxchg.org Subject: [patch] mm: remove sparsemem allocation details from the bootmem allocator alloc_bootmem_section() derives allocation area constraints from the specified sparsemem section. This is a bit specific for a generic memory allocator like bootmem, though, so move it over to sparsemem. Since __alloc_bootmem_node() already retries failed allocations with relaxed area constraints, the fallback code in sparsemem.c can be removed and the code becomes a bit more compact overall. Signed-off-by: Johannes Weiner han...@cmpxchg.org I've not tested it, but the intention seems sensible. I think it should remain a separate change. Thanks, Nish -- Nishanth Aravamudan n...@us.ibm.com IBM Linux Technology Center ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH] sparsemem/bootmem: catch greater than section size allocations
While testing AMS (Active Memory Sharing) / CMO (Cooperative Memory Overcommit) on powerpc, we tripped the following: kernel BUG at mm/bootmem.c:483! cpu 0x0: Vector: 700 (Program Check) at [c0c03940] pc: c0a62bd8: .alloc_bootmem_core+0x90/0x39c lr: c0a64bcc: .sparse_early_usemaps_alloc_node+0x84/0x29c sp: c0c03bc0 msr: 80021032 current = 0xc0b0cce0 paca= 0xc1d8 pid = 0, comm = swapper kernel BUG at mm/bootmem.c:483! enter ? for help [c0c03c80] c0a64bcc .sparse_early_usemaps_alloc_node+0x84/0x29c [c0c03d50] c0a64f10 .sparse_init+0x12c/0x28c [c0c03e20] c0a474f4 .setup_arch+0x20c/0x294 [c0c03ee0] c0a4079c .start_kernel+0xb4/0x460 [c0c03f90] c0009670 .start_here_common+0x1c/0x2c This is BUG_ON(limit goal + size limit); and after some debugging, it seems that goal = 0x700 limit = 0x800 and sparse_early_usemaps_alloc_node - sparse_early_usemaps_alloc_pgdat_section - alloc_bootmem_section calls return alloc_bootmem_section(usemap_size() * count, section_nr); This is on a system with 8TB available via the AMS pool, and as a quirk of AMS in firmware, all of that memory shows up in node 0. So, we end up with an allocation that will fail the goal/limit constraints. In theory, we could fall-back to alloc_bootmem_node() in sparse_early_usemaps_alloc_node(), but since we actually have HOTREMOVE defined, we'll BUG_ON() instead. A simple solution appears to be to disable the limit check if the size of the allocation in alloc_bootmem_secition exceeds the section size. Signed-off-by: Nishanth Aravamudan n...@us.ibm.com Cc: Dave Hansen haveb...@us.ibm.com Cc: Anton Blanchard an...@au1.ibm.com Cc: Paul Mackerras pau...@samba.org Cc: Ben Herrenschmidt b...@kernel.crashing.org Cc: Robert Jennings r...@linux.vnet.ibm.com Cc: linux...@kvack.org Cc: linuxppc-dev@lists.ozlabs.org --- include/linux/mmzone.h |2 ++ mm/bootmem.c |5 - 2 files changed, 6 insertions(+), 1 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 650ba2f..4176834 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -967,6 +967,8 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn) * PA_SECTION_SHIFTphysical address to/from section number * PFN_SECTION_SHIFT pfn to/from section number */ +#define BYTES_PER_SECTION (1UL SECTION_SIZE_BITS) + #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) #define PA_SECTION_SHIFT (SECTION_SIZE_BITS) diff --git a/mm/bootmem.c b/mm/bootmem.c index 668e94d..5cbbc76 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -770,7 +770,10 @@ void * __init alloc_bootmem_section(unsigned long size, pfn = section_nr_to_pfn(section_nr); goal = pfn PAGE_SHIFT; - limit = section_nr_to_pfn(section_nr + 1) PAGE_SHIFT; + if (size BYTES_PER_SECTION) + limit = 0; + else + limit = section_nr_to_pfn(section_nr + 1) PAGE_SHIFT; bdata = bootmem_node_data[early_pfn_to_nid(pfn)]; return alloc_bootmem_core(bdata, size, SMP_CACHE_BYTES, goal, limit); -- 1.7.5.4 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev