Re: [RFC 4/4] Sparse initialization of struct page array.
On Thu, Jul 25, 2013 at 6:42 AM, Robin Holt wrote: > On Thu, Jul 25, 2013 at 05:50:57AM -0700, Yinghai Lu wrote: >> On Wed, Jul 24, 2013 at 7:25 PM, Robin Holt wrote: >> >> >> >> How about holes that is not in memblock.reserved? >> >> >> >> before this patch: >> >> free_area_init_node/free_area_init_core/memmap_init_zone >> >> will mark all page in node range to Reserved in struct page, at first. >> >> >> >> but those holes is not mapped via kernel low mapping. >> >> so it should be ok not touch "struct page" for them. >> >> >> >> Now you only mark reserved for memblock.reserved at first, and later >> >> mark {memblock.memory} - { memblock.reserved} to be available. >> >> And that is ok. >> >> >> >> but should split that change to another patch and add some comment >> >> and change log for the change. >> >> in case there is some user like UEFI etc do weird thing. >> > >> > Nate and I talked this over today. Sorry for the delay, but it was the >> > first time we were both free. Neither of us quite understands what you >> > are asking for here. My interpretation is that you would like us to >> > change the use of the PageReserved flag such that during boot, we do not >> > set the flag at all from memmap_init_zone, and then only set it on pages >> > which, at the time of free_all_bootmem, have been allocated/reserved in >> > the memblock allocator. Is that correct? I will start to work that up >> > on the assumption that is what you are asking for. >> >> Not exactly. >> >> your change should be right, but there is some subtle change about >> holes handling. >> >> before mem holes between memory ranges in memblock.memory, get struct page, >> and initialized with to Reserved in memmap_init_zone. >> Those holes is not in memory.reserved, with your patches, those hole's >> struct page >> will still have all 0. >> >> Please separate change about set page to reserved according to >> memory.reserved >> to another patch. > > > Just want to make sure this is where you want me to go. Here is my > currently untested patch. Is that what you were expecting to have done? > One thing I don't like about this patch is it seems to slow down boot in > my simulator environment. I think I am going to look at restructuring > things a bit to see if I can eliminate that performance penalty. > Otherwise, I think I am following your directions. > > From bdd2fefa59af18f283af6f066bc644ddfa5c7da8 Mon Sep 17 00:00:00 2001 > From: Robin Holt > Date: Thu, 25 Jul 2013 04:25:15 -0500 > Subject: [RFC -v2-pre2 4/5] ZZZ Only SegPageReserved() on memblock reserved > pages. yes. > > --- > include/linux/mm.h | 2 ++ > mm/nobootmem.c | 3 +++ > mm/page_alloc.c| 18 +++--- > 3 files changed, 16 insertions(+), 7 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index e0c8528..b264a26 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1322,6 +1322,8 @@ static inline void adjust_managed_page_count(struct > page *page, long count) > totalram_pages += count; > } > > +extern void reserve_bootmem_region(unsigned long start, unsigned long end); > + > /* Free the reserved page into the buddy system, so it gets managed. */ > static inline void __free_reserved_page(struct page *page) > { > diff --git a/mm/nobootmem.c b/mm/nobootmem.c > index 2159e68..0840af2 100644 > --- a/mm/nobootmem.c > +++ b/mm/nobootmem.c > @@ -117,6 +117,9 @@ static unsigned long __init > free_low_memory_core_early(void) > phys_addr_t start, end, size; > u64 i; > > + for_each_reserved_mem_region(i, , ) > + reserve_bootmem_region(start, end); > + > for_each_free_mem_range(i, MAX_NUMNODES, , , NULL) > count += __free_memory_core(start, end); > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 048e166..3aa30b7 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -698,7 +698,7 @@ static void free_one_page(struct zone *zone, struct page > *page, int order, > } > > static void __init_single_page(unsigned long pfn, unsigned long zone, > - int nid, int reserved) > + int nid, int page_count) > { > struct page *page = pfn_to_page(pfn); > struct zone *z = _DATA(nid)->node_zones[zone]; > @@ -708,12 +708,9 @@ static void __init_single_page(unsigned long pfn, > unsigned long zone, > init_page_count(page); > page_mapcount_reset(page); > page_nid_reset_last(page); > - if (reserved) { > - SetPageReserved(page); > - } else { > - ClearPageReserved(page); > - set_page_count(page, 0); > - } > + ClearPageReserved(page); > + set_page_count(page, page_count); > + > /* > * Mark the block movable so that blocks are reserved for > * movable at startup. This will force kernel allocations > @@ -741,6 +738,13 @@ static void __init_single_page(unsigned long
Re: [RFC 4/4] Sparse initialization of struct page array.
On Thu, Jul 25, 2013 at 05:50:57AM -0700, Yinghai Lu wrote: > On Wed, Jul 24, 2013 at 7:25 PM, Robin Holt wrote: > >> > >> How about holes that is not in memblock.reserved? > >> > >> before this patch: > >> free_area_init_node/free_area_init_core/memmap_init_zone > >> will mark all page in node range to Reserved in struct page, at first. > >> > >> but those holes is not mapped via kernel low mapping. > >> so it should be ok not touch "struct page" for them. > >> > >> Now you only mark reserved for memblock.reserved at first, and later > >> mark {memblock.memory} - { memblock.reserved} to be available. > >> And that is ok. > >> > >> but should split that change to another patch and add some comment > >> and change log for the change. > >> in case there is some user like UEFI etc do weird thing. > > > > Nate and I talked this over today. Sorry for the delay, but it was the > > first time we were both free. Neither of us quite understands what you > > are asking for here. My interpretation is that you would like us to > > change the use of the PageReserved flag such that during boot, we do not > > set the flag at all from memmap_init_zone, and then only set it on pages > > which, at the time of free_all_bootmem, have been allocated/reserved in > > the memblock allocator. Is that correct? I will start to work that up > > on the assumption that is what you are asking for. > > Not exactly. > > your change should be right, but there is some subtle change about > holes handling. > > before mem holes between memory ranges in memblock.memory, get struct page, > and initialized with to Reserved in memmap_init_zone. > Those holes is not in memory.reserved, with your patches, those hole's > struct page > will still have all 0. > > Please separate change about set page to reserved according to memory.reserved > to another patch. Just want to make sure this is where you want me to go. Here is my currently untested patch. Is that what you were expecting to have done? One thing I don't like about this patch is it seems to slow down boot in my simulator environment. I think I am going to look at restructuring things a bit to see if I can eliminate that performance penalty. Otherwise, I think I am following your directions. Thanks, Robin Holt >From bdd2fefa59af18f283af6f066bc644ddfa5c7da8 Mon Sep 17 00:00:00 2001 From: Robin Holt Date: Thu, 25 Jul 2013 04:25:15 -0500 Subject: [RFC -v2-pre2 4/5] ZZZ Only SegPageReserved() on memblock reserved pages. --- include/linux/mm.h | 2 ++ mm/nobootmem.c | 3 +++ mm/page_alloc.c| 18 +++--- 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e0c8528..b264a26 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1322,6 +1322,8 @@ static inline void adjust_managed_page_count(struct page *page, long count) totalram_pages += count; } +extern void reserve_bootmem_region(unsigned long start, unsigned long end); + /* Free the reserved page into the buddy system, so it gets managed. */ static inline void __free_reserved_page(struct page *page) { diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 2159e68..0840af2 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -117,6 +117,9 @@ static unsigned long __init free_low_memory_core_early(void) phys_addr_t start, end, size; u64 i; + for_each_reserved_mem_region(i, , ) + reserve_bootmem_region(start, end); + for_each_free_mem_range(i, MAX_NUMNODES, , , NULL) count += __free_memory_core(start, end); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 048e166..3aa30b7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -698,7 +698,7 @@ static void free_one_page(struct zone *zone, struct page *page, int order, } static void __init_single_page(unsigned long pfn, unsigned long zone, - int nid, int reserved) + int nid, int page_count) { struct page *page = pfn_to_page(pfn); struct zone *z = _DATA(nid)->node_zones[zone]; @@ -708,12 +708,9 @@ static void __init_single_page(unsigned long pfn, unsigned long zone, init_page_count(page); page_mapcount_reset(page); page_nid_reset_last(page); - if (reserved) { - SetPageReserved(page); - } else { - ClearPageReserved(page); - set_page_count(page, 0); - } + ClearPageReserved(page); + set_page_count(page, page_count); + /* * Mark the block movable so that blocks are reserved for * movable at startup. This will force kernel allocations @@ -741,6 +738,13 @@ static void __init_single_page(unsigned long pfn, unsigned long zone, #endif } +void reserve_bootmem_region(unsigned long start, unsigned long end) +{ + for (; start < end; start++) + if (pfn_valid(start)) +
Re: [RFC 4/4] Sparse initialization of struct page array.
On Wed, Jul 24, 2013 at 7:25 PM, Robin Holt wrote: >> >> How about holes that is not in memblock.reserved? >> >> before this patch: >> free_area_init_node/free_area_init_core/memmap_init_zone >> will mark all page in node range to Reserved in struct page, at first. >> >> but those holes is not mapped via kernel low mapping. >> so it should be ok not touch "struct page" for them. >> >> Now you only mark reserved for memblock.reserved at first, and later >> mark {memblock.memory} - { memblock.reserved} to be available. >> And that is ok. >> >> but should split that change to another patch and add some comment >> and change log for the change. >> in case there is some user like UEFI etc do weird thing. > > Nate and I talked this over today. Sorry for the delay, but it was the > first time we were both free. Neither of us quite understands what you > are asking for here. My interpretation is that you would like us to > change the use of the PageReserved flag such that during boot, we do not > set the flag at all from memmap_init_zone, and then only set it on pages > which, at the time of free_all_bootmem, have been allocated/reserved in > the memblock allocator. Is that correct? I will start to work that up > on the assumption that is what you are asking for. Not exactly. your change should be right, but there is some subtle change about holes handling. before mem holes between memory ranges in memblock.memory, get struct page, and initialized with to Reserved in memmap_init_zone. Those holes is not in memory.reserved, with your patches, those hole's struct page will still have all 0. Please separate change about set page to reserved according to memory.reserved to another patch. Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Wed, Jul 24, 2013 at 7:25 PM, Robin Holt h...@sgi.com wrote: How about holes that is not in memblock.reserved? before this patch: free_area_init_node/free_area_init_core/memmap_init_zone will mark all page in node range to Reserved in struct page, at first. but those holes is not mapped via kernel low mapping. so it should be ok not touch struct page for them. Now you only mark reserved for memblock.reserved at first, and later mark {memblock.memory} - { memblock.reserved} to be available. And that is ok. but should split that change to another patch and add some comment and change log for the change. in case there is some user like UEFI etc do weird thing. Nate and I talked this over today. Sorry for the delay, but it was the first time we were both free. Neither of us quite understands what you are asking for here. My interpretation is that you would like us to change the use of the PageReserved flag such that during boot, we do not set the flag at all from memmap_init_zone, and then only set it on pages which, at the time of free_all_bootmem, have been allocated/reserved in the memblock allocator. Is that correct? I will start to work that up on the assumption that is what you are asking for. Not exactly. your change should be right, but there is some subtle change about holes handling. before mem holes between memory ranges in memblock.memory, get struct page, and initialized with to Reserved in memmap_init_zone. Those holes is not in memory.reserved, with your patches, those hole's struct page will still have all 0. Please separate change about set page to reserved according to memory.reserved to another patch. Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Thu, Jul 25, 2013 at 05:50:57AM -0700, Yinghai Lu wrote: On Wed, Jul 24, 2013 at 7:25 PM, Robin Holt h...@sgi.com wrote: How about holes that is not in memblock.reserved? before this patch: free_area_init_node/free_area_init_core/memmap_init_zone will mark all page in node range to Reserved in struct page, at first. but those holes is not mapped via kernel low mapping. so it should be ok not touch struct page for them. Now you only mark reserved for memblock.reserved at first, and later mark {memblock.memory} - { memblock.reserved} to be available. And that is ok. but should split that change to another patch and add some comment and change log for the change. in case there is some user like UEFI etc do weird thing. Nate and I talked this over today. Sorry for the delay, but it was the first time we were both free. Neither of us quite understands what you are asking for here. My interpretation is that you would like us to change the use of the PageReserved flag such that during boot, we do not set the flag at all from memmap_init_zone, and then only set it on pages which, at the time of free_all_bootmem, have been allocated/reserved in the memblock allocator. Is that correct? I will start to work that up on the assumption that is what you are asking for. Not exactly. your change should be right, but there is some subtle change about holes handling. before mem holes between memory ranges in memblock.memory, get struct page, and initialized with to Reserved in memmap_init_zone. Those holes is not in memory.reserved, with your patches, those hole's struct page will still have all 0. Please separate change about set page to reserved according to memory.reserved to another patch. Just want to make sure this is where you want me to go. Here is my currently untested patch. Is that what you were expecting to have done? One thing I don't like about this patch is it seems to slow down boot in my simulator environment. I think I am going to look at restructuring things a bit to see if I can eliminate that performance penalty. Otherwise, I think I am following your directions. Thanks, Robin Holt From bdd2fefa59af18f283af6f066bc644ddfa5c7da8 Mon Sep 17 00:00:00 2001 From: Robin Holt h...@sgi.com Date: Thu, 25 Jul 2013 04:25:15 -0500 Subject: [RFC -v2-pre2 4/5] ZZZ Only SegPageReserved() on memblock reserved pages. --- include/linux/mm.h | 2 ++ mm/nobootmem.c | 3 +++ mm/page_alloc.c| 18 +++--- 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e0c8528..b264a26 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1322,6 +1322,8 @@ static inline void adjust_managed_page_count(struct page *page, long count) totalram_pages += count; } +extern void reserve_bootmem_region(unsigned long start, unsigned long end); + /* Free the reserved page into the buddy system, so it gets managed. */ static inline void __free_reserved_page(struct page *page) { diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 2159e68..0840af2 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -117,6 +117,9 @@ static unsigned long __init free_low_memory_core_early(void) phys_addr_t start, end, size; u64 i; + for_each_reserved_mem_region(i, start, end) + reserve_bootmem_region(start, end); + for_each_free_mem_range(i, MAX_NUMNODES, start, end, NULL) count += __free_memory_core(start, end); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 048e166..3aa30b7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -698,7 +698,7 @@ static void free_one_page(struct zone *zone, struct page *page, int order, } static void __init_single_page(unsigned long pfn, unsigned long zone, - int nid, int reserved) + int nid, int page_count) { struct page *page = pfn_to_page(pfn); struct zone *z = NODE_DATA(nid)-node_zones[zone]; @@ -708,12 +708,9 @@ static void __init_single_page(unsigned long pfn, unsigned long zone, init_page_count(page); page_mapcount_reset(page); page_nid_reset_last(page); - if (reserved) { - SetPageReserved(page); - } else { - ClearPageReserved(page); - set_page_count(page, 0); - } + ClearPageReserved(page); + set_page_count(page, page_count); + /* * Mark the block movable so that blocks are reserved for * movable at startup. This will force kernel allocations @@ -741,6 +738,13 @@ static void __init_single_page(unsigned long pfn, unsigned long zone, #endif } +void reserve_bootmem_region(unsigned long start, unsigned long end) +{ + for (; start end; start++) + if (pfn_valid(start)) + SetPageReserved(pfn_to_page(start)); +} + static bool
Re: [RFC 4/4] Sparse initialization of struct page array.
On Thu, Jul 25, 2013 at 6:42 AM, Robin Holt h...@sgi.com wrote: On Thu, Jul 25, 2013 at 05:50:57AM -0700, Yinghai Lu wrote: On Wed, Jul 24, 2013 at 7:25 PM, Robin Holt h...@sgi.com wrote: How about holes that is not in memblock.reserved? before this patch: free_area_init_node/free_area_init_core/memmap_init_zone will mark all page in node range to Reserved in struct page, at first. but those holes is not mapped via kernel low mapping. so it should be ok not touch struct page for them. Now you only mark reserved for memblock.reserved at first, and later mark {memblock.memory} - { memblock.reserved} to be available. And that is ok. but should split that change to another patch and add some comment and change log for the change. in case there is some user like UEFI etc do weird thing. Nate and I talked this over today. Sorry for the delay, but it was the first time we were both free. Neither of us quite understands what you are asking for here. My interpretation is that you would like us to change the use of the PageReserved flag such that during boot, we do not set the flag at all from memmap_init_zone, and then only set it on pages which, at the time of free_all_bootmem, have been allocated/reserved in the memblock allocator. Is that correct? I will start to work that up on the assumption that is what you are asking for. Not exactly. your change should be right, but there is some subtle change about holes handling. before mem holes between memory ranges in memblock.memory, get struct page, and initialized with to Reserved in memmap_init_zone. Those holes is not in memory.reserved, with your patches, those hole's struct page will still have all 0. Please separate change about set page to reserved according to memory.reserved to another patch. Just want to make sure this is where you want me to go. Here is my currently untested patch. Is that what you were expecting to have done? One thing I don't like about this patch is it seems to slow down boot in my simulator environment. I think I am going to look at restructuring things a bit to see if I can eliminate that performance penalty. Otherwise, I think I am following your directions. From bdd2fefa59af18f283af6f066bc644ddfa5c7da8 Mon Sep 17 00:00:00 2001 From: Robin Holt h...@sgi.com Date: Thu, 25 Jul 2013 04:25:15 -0500 Subject: [RFC -v2-pre2 4/5] ZZZ Only SegPageReserved() on memblock reserved pages. yes. --- include/linux/mm.h | 2 ++ mm/nobootmem.c | 3 +++ mm/page_alloc.c| 18 +++--- 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e0c8528..b264a26 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1322,6 +1322,8 @@ static inline void adjust_managed_page_count(struct page *page, long count) totalram_pages += count; } +extern void reserve_bootmem_region(unsigned long start, unsigned long end); + /* Free the reserved page into the buddy system, so it gets managed. */ static inline void __free_reserved_page(struct page *page) { diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 2159e68..0840af2 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -117,6 +117,9 @@ static unsigned long __init free_low_memory_core_early(void) phys_addr_t start, end, size; u64 i; + for_each_reserved_mem_region(i, start, end) + reserve_bootmem_region(start, end); + for_each_free_mem_range(i, MAX_NUMNODES, start, end, NULL) count += __free_memory_core(start, end); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 048e166..3aa30b7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -698,7 +698,7 @@ static void free_one_page(struct zone *zone, struct page *page, int order, } static void __init_single_page(unsigned long pfn, unsigned long zone, - int nid, int reserved) + int nid, int page_count) { struct page *page = pfn_to_page(pfn); struct zone *z = NODE_DATA(nid)-node_zones[zone]; @@ -708,12 +708,9 @@ static void __init_single_page(unsigned long pfn, unsigned long zone, init_page_count(page); page_mapcount_reset(page); page_nid_reset_last(page); - if (reserved) { - SetPageReserved(page); - } else { - ClearPageReserved(page); - set_page_count(page, 0); - } + ClearPageReserved(page); + set_page_count(page, page_count); + /* * Mark the block movable so that blocks are reserved for * movable at startup. This will force kernel allocations @@ -741,6 +738,13 @@ static void __init_single_page(unsigned long pfn, unsigned long zone, #endif } +void reserve_bootmem_region(unsigned long start, unsigned long end) +{ + for (; start end; start++) +
Re: [RFC 4/4] Sparse initialization of struct page array.
On Fri, Jul 12, 2013 at 09:19:12PM -0700, Yinghai Lu wrote: > On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt wrote: > > During boot of large memory machines, a significant portion of boot > > is spent initializing the struct page array. The vast majority of > > those pages are not referenced during boot. > > > > Change this over to only initializing the pages when they are > > actually allocated. > > > > Besides the advantage of boot speed, this allows us the chance to > > use normal performance monitoring tools to determine where the bulk > > of time is spent during page initialization. > > > > Signed-off-by: Robin Holt > > Signed-off-by: Nate Zimmer > > To: "H. Peter Anvin" > > To: Ingo Molnar > > Cc: Linux Kernel > > Cc: Linux MM > > Cc: Rob Landley > > Cc: Mike Travis > > Cc: Daniel J Blueman > > Cc: Andrew Morton > > Cc: Greg KH > > Cc: Yinghai Lu > > Cc: Mel Gorman > > --- > > include/linux/mm.h | 11 + > > include/linux/page-flags.h | 5 +- > > mm/nobootmem.c | 5 ++ > > mm/page_alloc.c| 117 > > +++-- > > 4 files changed, 132 insertions(+), 6 deletions(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index e0c8528..3de08b5 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page > > *page) > > __free_page(page); > > } > > > > +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end); > > + > > +static inline void __reserve_bootmem_page(struct page *page) > > +{ > > + phys_addr_t start = page_to_pfn(page) << PAGE_SHIFT; > > + phys_addr_t end = start + PAGE_SIZE; > > + > > + __reserve_bootmem_region(start, end); > > +} > > + > > static inline void free_reserved_page(struct page *page) > > { > > + __reserve_bootmem_page(page); > > __free_reserved_page(page); > > adjust_managed_page_count(page, 1); > > } > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > > index 6d53675..79e8eb7 100644 > > --- a/include/linux/page-flags.h > > +++ b/include/linux/page-flags.h > > @@ -83,6 +83,7 @@ enum pageflags { > > PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/ > > PG_arch_1, > > PG_reserved, > > + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh > > */ > > PG_private, /* If pagecache, has fs-private data */ > > PG_private_2, /* If pagecache, has fs aux data */ > > PG_writeback, /* Page is under writeback */ > > @@ -211,6 +212,8 @@ PAGEFLAG(SwapBacked, swapbacked) > > __CLEARPAGEFLAG(SwapBacked, swapbacked) > > > > __PAGEFLAG(SlobFree, slob_free) > > > > +PAGEFLAG(Uninitialized2Mib, uninitialized2mib) > > + > > /* > > * Private page markings that may be used by the filesystem that owns the > > page > > * for its own purposes. > > @@ -499,7 +502,7 @@ static inline void ClearPageSlabPfmemalloc(struct page > > *page) > > #define PAGE_FLAGS_CHECK_AT_FREE \ > > (1 << PG_lru | 1 << PG_locked| \ > > 1 << PG_private | 1 << PG_private_2 | \ > > -1 << PG_writeback | 1 << PG_reserved | \ > > +1 << PG_writeback | 1 << PG_reserved | 1 << PG_uninitialized2mib | > > \ > > 1 << PG_slab| 1 << PG_swapcache | 1 << PG_active | \ > > 1 << PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \ > > __PG_COMPOUND_LOCK) > > diff --git a/mm/nobootmem.c b/mm/nobootmem.c > > index 3b512ca..e3a386d 100644 > > --- a/mm/nobootmem.c > > +++ b/mm/nobootmem.c > > @@ -126,6 +126,9 @@ static unsigned long __init > > free_low_memory_core_early(void) > > phys_addr_t start, end, size; > > u64 i; > > > > + for_each_reserved_mem_region(i, , ) > > + __reserve_bootmem_region(start, end); > > + > > How about holes that is not in memblock.reserved? > > before this patch: > free_area_init_node/free_area_init_core/memmap_init_zone > will mark all page in node range to Reserved in struct page, at first. > > but those holes is not mapped via kernel low mapping. > so it should be ok not touch "struct page" for them. > > Now you only mark reserved for memblock.reserved at first, and later > mark {memblock.memory} - { memblock.reserved} to be available. > And that is ok. > > but should split that change to another patch and add some comment > and change log for the change. > in case there is some user like UEFI etc do weird thing. Nate and I talked this over today. Sorry for the delay, but it was the first time we were both free. Neither of us quite understands what you are asking for here. My interpretation is that you would like us to change the use of the PageReserved flag such that during boot, we do not set the flag at all from memmap_init_zone, and then only set it on pages which, at the time of free_all_bootmem, have been
Re: [RFC 4/4] Sparse initialization of struct page array.
On Fri, Jul 12, 2013 at 09:19:12PM -0700, Yinghai Lu wrote: On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt h...@sgi.com wrote: During boot of large memory machines, a significant portion of boot is spent initializing the struct page array. The vast majority of those pages are not referenced during boot. Change this over to only initializing the pages when they are actually allocated. Besides the advantage of boot speed, this allows us the chance to use normal performance monitoring tools to determine where the bulk of time is spent during page initialization. Signed-off-by: Robin Holt h...@sgi.com Signed-off-by: Nate Zimmer nzim...@sgi.com To: H. Peter Anvin h...@zytor.com To: Ingo Molnar mi...@kernel.org Cc: Linux Kernel linux-kernel@vger.kernel.org Cc: Linux MM linux...@kvack.org Cc: Rob Landley r...@landley.net Cc: Mike Travis tra...@sgi.com Cc: Daniel J Blueman dan...@numascale-asia.com Cc: Andrew Morton a...@linux-foundation.org Cc: Greg KH gre...@linuxfoundation.org Cc: Yinghai Lu ying...@kernel.org Cc: Mel Gorman mgor...@suse.de --- include/linux/mm.h | 11 + include/linux/page-flags.h | 5 +- mm/nobootmem.c | 5 ++ mm/page_alloc.c| 117 +++-- 4 files changed, 132 insertions(+), 6 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e0c8528..3de08b5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page *page) __free_page(page); } +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end); + +static inline void __reserve_bootmem_page(struct page *page) +{ + phys_addr_t start = page_to_pfn(page) PAGE_SHIFT; + phys_addr_t end = start + PAGE_SIZE; + + __reserve_bootmem_region(start, end); +} + static inline void free_reserved_page(struct page *page) { + __reserve_bootmem_page(page); __free_reserved_page(page); adjust_managed_page_count(page, 1); } diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6d53675..79e8eb7 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -83,6 +83,7 @@ enum pageflags { PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/ PG_arch_1, PG_reserved, + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ PG_private, /* If pagecache, has fs-private data */ PG_private_2, /* If pagecache, has fs aux data */ PG_writeback, /* Page is under writeback */ @@ -211,6 +212,8 @@ PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked) __PAGEFLAG(SlobFree, slob_free) +PAGEFLAG(Uninitialized2Mib, uninitialized2mib) + /* * Private page markings that may be used by the filesystem that owns the page * for its own purposes. @@ -499,7 +502,7 @@ static inline void ClearPageSlabPfmemalloc(struct page *page) #define PAGE_FLAGS_CHECK_AT_FREE \ (1 PG_lru | 1 PG_locked| \ 1 PG_private | 1 PG_private_2 | \ -1 PG_writeback | 1 PG_reserved | \ +1 PG_writeback | 1 PG_reserved | 1 PG_uninitialized2mib | \ 1 PG_slab| 1 PG_swapcache | 1 PG_active | \ 1 PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \ __PG_COMPOUND_LOCK) diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 3b512ca..e3a386d 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -126,6 +126,9 @@ static unsigned long __init free_low_memory_core_early(void) phys_addr_t start, end, size; u64 i; + for_each_reserved_mem_region(i, start, end) + __reserve_bootmem_region(start, end); + How about holes that is not in memblock.reserved? before this patch: free_area_init_node/free_area_init_core/memmap_init_zone will mark all page in node range to Reserved in struct page, at first. but those holes is not mapped via kernel low mapping. so it should be ok not touch struct page for them. Now you only mark reserved for memblock.reserved at first, and later mark {memblock.memory} - { memblock.reserved} to be available. And that is ok. but should split that change to another patch and add some comment and change log for the change. in case there is some user like UEFI etc do weird thing. Nate and I talked this over today. Sorry for the delay, but it was the first time we were both free. Neither of us quite understands what you are asking for here. My interpretation is that you would like us to change the use of the PageReserved flag such that during boot, we do not set the flag at all from memmap_init_zone, and then only set it on pages which, at the time of free_all_bootmem, have been
Re: [RFC 4/4] Sparse initialization of struct page array.
On Tue, Jul 23, 2013 at 06:41:50AM -0500, Robin Holt wrote: > On Tue, Jul 23, 2013 at 06:15:49AM -0500, Robin Holt wrote: > > I think the other critical path which is affected is in expand(). > > There, we just call ensure_page_is_initialized() blindly which does > > the check against the other page. The below is a nearly zero addition. > > Sorry for the confusion. My morning coffee has not kicked in yet. > > I don't have access to the 16TiB system until Thursday unless the other > testing on it fails early. I did boot a 2TiB system with the a change > which set the Unitialized_2m flag on all pages in that 2MiB range > during memmap_init_zone. That makes the expand check test against > the referenced page instead of having to go back to the 2MiB page. > It appears to have added less than a second to the 2TiB boot so I hope > it has equally little impact to the 16TiB boot. I was wrong. One of the two logs I looked at was the wrong one. Setting that Unitialized2m flag on all pages added 30 seconds to the 2TiB boot's memmap_init_zone(). Please disregard. That brings me back to the belief we need a better solution for the expand() path. Robin > > I will clean up this patch some more and resend the currently untested > set later today. > > Thanks, > Robin > > > > > Robin > > > > On Tue, Jul 23, 2013 at 06:09:47AM -0500, Robin Holt wrote: > > > On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote: > > > > > > > > * H. Peter Anvin wrote: > > > > > > > > > On 07/15/2013 11:26 AM, Robin Holt wrote: > > > > > > > > > > > Is there a fairly cheap way to determine definitively that the > > > > > > struct > > > > > > page is not initialized? > > > > > > > > > > By definition I would assume no. The only way I can think of would > > > > > be > > > > > to unmap the memory associated with the struct page in the TLB and > > > > > initialize the struct pages at trap time. > > > > > > > > But ... the only fastpath impact I can see of delayed initialization > > > > right > > > > now is this piece of logic in prep_new_page(): > > > > > > > > @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int > > > > order, gfp_t gfp_flags) > > > > > > > > for (i = 0; i < (1 << order); i++) { > > > > struct page *p = page + i; > > > > + > > > > + if (PageUninitialized2Mib(p)) > > > > + expand_page_initialization(page); > > > > + > > > > if (unlikely(check_new_page(p))) > > > > return 1; > > > > > > > > That is where I think it can be made zero overhead in the > > > > already-initialized case, because page-flags are already used in > > > > check_new_page(): > > > > > > The problem I see here is that the page flags we need to check for the > > > uninitialized flag are in the "other" page for the page aligned at the > > > 2MiB virtual address, not the page currently being referenced. > > > > > > Let me try a version of the patch where we set the PG_unintialized_2m > > > flag on all pages, including the aligned pages and see what that does > > > to performance. > > > > > > Robin > > > > > > > > > > > static inline int check_new_page(struct page *page) > > > > { > > > > if (unlikely(page_mapcount(page) | > > > > (page->mapping != NULL) | > > > > (atomic_read(>_count) != 0) | > > > > (page->flags & PAGE_FLAGS_CHECK_AT_PREP) | > > > > (mem_cgroup_bad_page_check(page { > > > > bad_page(page); > > > > return 1; > > > > > > > > see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for > > > > every > > > > struct page on allocation. > > > > > > > > We can micro-optimize that low overhead to zero-overhead, by > > > > integrating > > > > the PageUninitialized2Mib() check into check_new_page(). This can be > > > > done > > > > by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing: > > > > > > > > > > > > if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) { > > > > if (PageUninitialized2Mib(p)) > > > > expand_page_initialization(page); > > > > ... > > > > } > > > > > > > > if (unlikely(page_mapcount(page) | > > > > (page->mapping != NULL) | > > > > (atomic_read(>_count) != 0) | > > > > (mem_cgroup_bad_page_check(page { > > > > bad_page(page); > > > > > > > > return 1; > > > > > > > > this will result in making it essentially zero-overhead, the > > > > expand_page_initialization() logic is now in a slowpath. > > > > > > > > Am I missing anything here? > > > > > > > > Thanks, > > > > > > > > Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at
Re: [RFC 4/4] Sparse initialization of struct page array.
On Tue, Jul 23, 2013 at 06:15:49AM -0500, Robin Holt wrote: > I think the other critical path which is affected is in expand(). > There, we just call ensure_page_is_initialized() blindly which does > the check against the other page. The below is a nearly zero addition. > Sorry for the confusion. My morning coffee has not kicked in yet. I don't have access to the 16TiB system until Thursday unless the other testing on it fails early. I did boot a 2TiB system with the a change which set the Unitialized_2m flag on all pages in that 2MiB range during memmap_init_zone. That makes the expand check test against the referenced page instead of having to go back to the 2MiB page. It appears to have added less than a second to the 2TiB boot so I hope it has equally little impact to the 16TiB boot. I will clean up this patch some more and resend the currently untested set later today. Thanks, Robin > > Robin > > On Tue, Jul 23, 2013 at 06:09:47AM -0500, Robin Holt wrote: > > On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote: > > > > > > * H. Peter Anvin wrote: > > > > > > > On 07/15/2013 11:26 AM, Robin Holt wrote: > > > > > > > > > Is there a fairly cheap way to determine definitively that the struct > > > > > page is not initialized? > > > > > > > > By definition I would assume no. The only way I can think of would be > > > > to unmap the memory associated with the struct page in the TLB and > > > > initialize the struct pages at trap time. > > > > > > But ... the only fastpath impact I can see of delayed initialization > > > right > > > now is this piece of logic in prep_new_page(): > > > > > > @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int > > > order, gfp_t gfp_flags) > > > > > > for (i = 0; i < (1 << order); i++) { > > > struct page *p = page + i; > > > + > > > + if (PageUninitialized2Mib(p)) > > > + expand_page_initialization(page); > > > + > > > if (unlikely(check_new_page(p))) > > > return 1; > > > > > > That is where I think it can be made zero overhead in the > > > already-initialized case, because page-flags are already used in > > > check_new_page(): > > > > The problem I see here is that the page flags we need to check for the > > uninitialized flag are in the "other" page for the page aligned at the > > 2MiB virtual address, not the page currently being referenced. > > > > Let me try a version of the patch where we set the PG_unintialized_2m > > flag on all pages, including the aligned pages and see what that does > > to performance. > > > > Robin > > > > > > > > static inline int check_new_page(struct page *page) > > > { > > > if (unlikely(page_mapcount(page) | > > > (page->mapping != NULL) | > > > (atomic_read(>_count) != 0) | > > > (page->flags & PAGE_FLAGS_CHECK_AT_PREP) | > > > (mem_cgroup_bad_page_check(page { > > > bad_page(page); > > > return 1; > > > > > > see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for > > > every > > > struct page on allocation. > > > > > > We can micro-optimize that low overhead to zero-overhead, by integrating > > > the PageUninitialized2Mib() check into check_new_page(). This can be done > > > by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing: > > > > > > > > > if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) { > > > if (PageUninitialized2Mib(p)) > > > expand_page_initialization(page); > > > ... > > > } > > > > > > if (unlikely(page_mapcount(page) | > > > (page->mapping != NULL) | > > > (atomic_read(>_count) != 0) | > > > (mem_cgroup_bad_page_check(page { > > > bad_page(page); > > > > > > return 1; > > > > > > this will result in making it essentially zero-overhead, the > > > expand_page_initialization() logic is now in a slowpath. > > > > > > Am I missing anything here? > > > > > > Thanks, > > > > > > Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
I think the other critical path which is affected is in expand(). There, we just call ensure_page_is_initialized() blindly which does the check against the other page. The below is a nearly zero addition. Sorry for the confusion. My morning coffee has not kicked in yet. Robin On Tue, Jul 23, 2013 at 06:09:47AM -0500, Robin Holt wrote: > On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote: > > > > * H. Peter Anvin wrote: > > > > > On 07/15/2013 11:26 AM, Robin Holt wrote: > > > > > > > Is there a fairly cheap way to determine definitively that the struct > > > > page is not initialized? > > > > > > By definition I would assume no. The only way I can think of would be > > > to unmap the memory associated with the struct page in the TLB and > > > initialize the struct pages at trap time. > > > > But ... the only fastpath impact I can see of delayed initialization right > > now is this piece of logic in prep_new_page(): > > > > @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, > > gfp_t gfp_flags) > > > > for (i = 0; i < (1 << order); i++) { > > struct page *p = page + i; > > + > > + if (PageUninitialized2Mib(p)) > > + expand_page_initialization(page); > > + > > if (unlikely(check_new_page(p))) > > return 1; > > > > That is where I think it can be made zero overhead in the > > already-initialized case, because page-flags are already used in > > check_new_page(): > > The problem I see here is that the page flags we need to check for the > uninitialized flag are in the "other" page for the page aligned at the > 2MiB virtual address, not the page currently being referenced. > > Let me try a version of the patch where we set the PG_unintialized_2m > flag on all pages, including the aligned pages and see what that does > to performance. > > Robin > > > > > static inline int check_new_page(struct page *page) > > { > > if (unlikely(page_mapcount(page) | > > (page->mapping != NULL) | > > (atomic_read(>_count) != 0) | > > (page->flags & PAGE_FLAGS_CHECK_AT_PREP) | > > (mem_cgroup_bad_page_check(page { > > bad_page(page); > > return 1; > > > > see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every > > struct page on allocation. > > > > We can micro-optimize that low overhead to zero-overhead, by integrating > > the PageUninitialized2Mib() check into check_new_page(). This can be done > > by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing: > > > > > > if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) { > > if (PageUninitialized2Mib(p)) > > expand_page_initialization(page); > > ... > > } > > > > if (unlikely(page_mapcount(page) | > > (page->mapping != NULL) | > > (atomic_read(>_count) != 0) | > > (mem_cgroup_bad_page_check(page { > > bad_page(page); > > > > return 1; > > > > this will result in making it essentially zero-overhead, the > > expand_page_initialization() logic is now in a slowpath. > > > > Am I missing anything here? > > > > Thanks, > > > > Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote: > > * H. Peter Anvin wrote: > > > On 07/15/2013 11:26 AM, Robin Holt wrote: > > > > > Is there a fairly cheap way to determine definitively that the struct > > > page is not initialized? > > > > By definition I would assume no. The only way I can think of would be > > to unmap the memory associated with the struct page in the TLB and > > initialize the struct pages at trap time. > > But ... the only fastpath impact I can see of delayed initialization right > now is this piece of logic in prep_new_page(): > > @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, > gfp_t gfp_flags) > > for (i = 0; i < (1 << order); i++) { > struct page *p = page + i; > + > + if (PageUninitialized2Mib(p)) > + expand_page_initialization(page); > + > if (unlikely(check_new_page(p))) > return 1; > > That is where I think it can be made zero overhead in the > already-initialized case, because page-flags are already used in > check_new_page(): The problem I see here is that the page flags we need to check for the uninitialized flag are in the "other" page for the page aligned at the 2MiB virtual address, not the page currently being referenced. Let me try a version of the patch where we set the PG_unintialized_2m flag on all pages, including the aligned pages and see what that does to performance. Robin > > static inline int check_new_page(struct page *page) > { > if (unlikely(page_mapcount(page) | > (page->mapping != NULL) | > (atomic_read(>_count) != 0) | > (page->flags & PAGE_FLAGS_CHECK_AT_PREP) | > (mem_cgroup_bad_page_check(page { > bad_page(page); > return 1; > > see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every > struct page on allocation. > > We can micro-optimize that low overhead to zero-overhead, by integrating > the PageUninitialized2Mib() check into check_new_page(). This can be done > by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing: > > > if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) { > if (PageUninitialized2Mib(p)) > expand_page_initialization(page); > ... > } > > if (unlikely(page_mapcount(page) | > (page->mapping != NULL) | > (atomic_read(>_count) != 0) | > (mem_cgroup_bad_page_check(page { > bad_page(page); > > return 1; > > this will result in making it essentially zero-overhead, the > expand_page_initialization() logic is now in a slowpath. > > Am I missing anything here? > > Thanks, > > Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
* H. Peter Anvin wrote: > On 07/15/2013 11:26 AM, Robin Holt wrote: > > > Is there a fairly cheap way to determine definitively that the struct > > page is not initialized? > > By definition I would assume no. The only way I can think of would be > to unmap the memory associated with the struct page in the TLB and > initialize the struct pages at trap time. But ... the only fastpath impact I can see of delayed initialization right now is this piece of logic in prep_new_page(): @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, gfp_t gfp_flags) for (i = 0; i < (1 << order); i++) { struct page *p = page + i; + + if (PageUninitialized2Mib(p)) + expand_page_initialization(page); + if (unlikely(check_new_page(p))) return 1; That is where I think it can be made zero overhead in the already-initialized case, because page-flags are already used in check_new_page(): static inline int check_new_page(struct page *page) { if (unlikely(page_mapcount(page) | (page->mapping != NULL) | (atomic_read(>_count) != 0) | (page->flags & PAGE_FLAGS_CHECK_AT_PREP) | (mem_cgroup_bad_page_check(page { bad_page(page); return 1; see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every struct page on allocation. We can micro-optimize that low overhead to zero-overhead, by integrating the PageUninitialized2Mib() check into check_new_page(). This can be done by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing: if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) { if (PageUninitialized2Mib(p)) expand_page_initialization(page); ... } if (unlikely(page_mapcount(page) | (page->mapping != NULL) | (atomic_read(>_count) != 0) | (mem_cgroup_bad_page_check(page { bad_page(page); return 1; this will result in making it essentially zero-overhead, the expand_page_initialization() logic is now in a slowpath. Am I missing anything here? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
* H. Peter Anvin h...@zytor.com wrote: On 07/15/2013 11:26 AM, Robin Holt wrote: Is there a fairly cheap way to determine definitively that the struct page is not initialized? By definition I would assume no. The only way I can think of would be to unmap the memory associated with the struct page in the TLB and initialize the struct pages at trap time. But ... the only fastpath impact I can see of delayed initialization right now is this piece of logic in prep_new_page(): @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, gfp_t gfp_flags) for (i = 0; i (1 order); i++) { struct page *p = page + i; + + if (PageUninitialized2Mib(p)) + expand_page_initialization(page); + if (unlikely(check_new_page(p))) return 1; That is where I think it can be made zero overhead in the already-initialized case, because page-flags are already used in check_new_page(): static inline int check_new_page(struct page *page) { if (unlikely(page_mapcount(page) | (page-mapping != NULL) | (atomic_read(page-_count) != 0) | (page-flags PAGE_FLAGS_CHECK_AT_PREP) | (mem_cgroup_bad_page_check(page { bad_page(page); return 1; see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every struct page on allocation. We can micro-optimize that low overhead to zero-overhead, by integrating the PageUninitialized2Mib() check into check_new_page(). This can be done by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing: if (unlikely(page-flags PAGE_FLAGS_CHECK_AT_PREP)) { if (PageUninitialized2Mib(p)) expand_page_initialization(page); ... } if (unlikely(page_mapcount(page) | (page-mapping != NULL) | (atomic_read(page-_count) != 0) | (mem_cgroup_bad_page_check(page { bad_page(page); return 1; this will result in making it essentially zero-overhead, the expand_page_initialization() logic is now in a slowpath. Am I missing anything here? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote: * H. Peter Anvin h...@zytor.com wrote: On 07/15/2013 11:26 AM, Robin Holt wrote: Is there a fairly cheap way to determine definitively that the struct page is not initialized? By definition I would assume no. The only way I can think of would be to unmap the memory associated with the struct page in the TLB and initialize the struct pages at trap time. But ... the only fastpath impact I can see of delayed initialization right now is this piece of logic in prep_new_page(): @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, gfp_t gfp_flags) for (i = 0; i (1 order); i++) { struct page *p = page + i; + + if (PageUninitialized2Mib(p)) + expand_page_initialization(page); + if (unlikely(check_new_page(p))) return 1; That is where I think it can be made zero overhead in the already-initialized case, because page-flags are already used in check_new_page(): The problem I see here is that the page flags we need to check for the uninitialized flag are in the other page for the page aligned at the 2MiB virtual address, not the page currently being referenced. Let me try a version of the patch where we set the PG_unintialized_2m flag on all pages, including the aligned pages and see what that does to performance. Robin static inline int check_new_page(struct page *page) { if (unlikely(page_mapcount(page) | (page-mapping != NULL) | (atomic_read(page-_count) != 0) | (page-flags PAGE_FLAGS_CHECK_AT_PREP) | (mem_cgroup_bad_page_check(page { bad_page(page); return 1; see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every struct page on allocation. We can micro-optimize that low overhead to zero-overhead, by integrating the PageUninitialized2Mib() check into check_new_page(). This can be done by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing: if (unlikely(page-flags PAGE_FLAGS_CHECK_AT_PREP)) { if (PageUninitialized2Mib(p)) expand_page_initialization(page); ... } if (unlikely(page_mapcount(page) | (page-mapping != NULL) | (atomic_read(page-_count) != 0) | (mem_cgroup_bad_page_check(page { bad_page(page); return 1; this will result in making it essentially zero-overhead, the expand_page_initialization() logic is now in a slowpath. Am I missing anything here? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
I think the other critical path which is affected is in expand(). There, we just call ensure_page_is_initialized() blindly which does the check against the other page. The below is a nearly zero addition. Sorry for the confusion. My morning coffee has not kicked in yet. Robin On Tue, Jul 23, 2013 at 06:09:47AM -0500, Robin Holt wrote: On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote: * H. Peter Anvin h...@zytor.com wrote: On 07/15/2013 11:26 AM, Robin Holt wrote: Is there a fairly cheap way to determine definitively that the struct page is not initialized? By definition I would assume no. The only way I can think of would be to unmap the memory associated with the struct page in the TLB and initialize the struct pages at trap time. But ... the only fastpath impact I can see of delayed initialization right now is this piece of logic in prep_new_page(): @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, gfp_t gfp_flags) for (i = 0; i (1 order); i++) { struct page *p = page + i; + + if (PageUninitialized2Mib(p)) + expand_page_initialization(page); + if (unlikely(check_new_page(p))) return 1; That is where I think it can be made zero overhead in the already-initialized case, because page-flags are already used in check_new_page(): The problem I see here is that the page flags we need to check for the uninitialized flag are in the other page for the page aligned at the 2MiB virtual address, not the page currently being referenced. Let me try a version of the patch where we set the PG_unintialized_2m flag on all pages, including the aligned pages and see what that does to performance. Robin static inline int check_new_page(struct page *page) { if (unlikely(page_mapcount(page) | (page-mapping != NULL) | (atomic_read(page-_count) != 0) | (page-flags PAGE_FLAGS_CHECK_AT_PREP) | (mem_cgroup_bad_page_check(page { bad_page(page); return 1; see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every struct page on allocation. We can micro-optimize that low overhead to zero-overhead, by integrating the PageUninitialized2Mib() check into check_new_page(). This can be done by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing: if (unlikely(page-flags PAGE_FLAGS_CHECK_AT_PREP)) { if (PageUninitialized2Mib(p)) expand_page_initialization(page); ... } if (unlikely(page_mapcount(page) | (page-mapping != NULL) | (atomic_read(page-_count) != 0) | (mem_cgroup_bad_page_check(page { bad_page(page); return 1; this will result in making it essentially zero-overhead, the expand_page_initialization() logic is now in a slowpath. Am I missing anything here? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Tue, Jul 23, 2013 at 06:15:49AM -0500, Robin Holt wrote: I think the other critical path which is affected is in expand(). There, we just call ensure_page_is_initialized() blindly which does the check against the other page. The below is a nearly zero addition. Sorry for the confusion. My morning coffee has not kicked in yet. I don't have access to the 16TiB system until Thursday unless the other testing on it fails early. I did boot a 2TiB system with the a change which set the Unitialized_2m flag on all pages in that 2MiB range during memmap_init_zone. That makes the expand check test against the referenced page instead of having to go back to the 2MiB page. It appears to have added less than a second to the 2TiB boot so I hope it has equally little impact to the 16TiB boot. I will clean up this patch some more and resend the currently untested set later today. Thanks, Robin Robin On Tue, Jul 23, 2013 at 06:09:47AM -0500, Robin Holt wrote: On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote: * H. Peter Anvin h...@zytor.com wrote: On 07/15/2013 11:26 AM, Robin Holt wrote: Is there a fairly cheap way to determine definitively that the struct page is not initialized? By definition I would assume no. The only way I can think of would be to unmap the memory associated with the struct page in the TLB and initialize the struct pages at trap time. But ... the only fastpath impact I can see of delayed initialization right now is this piece of logic in prep_new_page(): @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, gfp_t gfp_flags) for (i = 0; i (1 order); i++) { struct page *p = page + i; + + if (PageUninitialized2Mib(p)) + expand_page_initialization(page); + if (unlikely(check_new_page(p))) return 1; That is where I think it can be made zero overhead in the already-initialized case, because page-flags are already used in check_new_page(): The problem I see here is that the page flags we need to check for the uninitialized flag are in the other page for the page aligned at the 2MiB virtual address, not the page currently being referenced. Let me try a version of the patch where we set the PG_unintialized_2m flag on all pages, including the aligned pages and see what that does to performance. Robin static inline int check_new_page(struct page *page) { if (unlikely(page_mapcount(page) | (page-mapping != NULL) | (atomic_read(page-_count) != 0) | (page-flags PAGE_FLAGS_CHECK_AT_PREP) | (mem_cgroup_bad_page_check(page { bad_page(page); return 1; see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every struct page on allocation. We can micro-optimize that low overhead to zero-overhead, by integrating the PageUninitialized2Mib() check into check_new_page(). This can be done by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing: if (unlikely(page-flags PAGE_FLAGS_CHECK_AT_PREP)) { if (PageUninitialized2Mib(p)) expand_page_initialization(page); ... } if (unlikely(page_mapcount(page) | (page-mapping != NULL) | (atomic_read(page-_count) != 0) | (mem_cgroup_bad_page_check(page { bad_page(page); return 1; this will result in making it essentially zero-overhead, the expand_page_initialization() logic is now in a slowpath. Am I missing anything here? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Tue, Jul 23, 2013 at 06:41:50AM -0500, Robin Holt wrote: On Tue, Jul 23, 2013 at 06:15:49AM -0500, Robin Holt wrote: I think the other critical path which is affected is in expand(). There, we just call ensure_page_is_initialized() blindly which does the check against the other page. The below is a nearly zero addition. Sorry for the confusion. My morning coffee has not kicked in yet. I don't have access to the 16TiB system until Thursday unless the other testing on it fails early. I did boot a 2TiB system with the a change which set the Unitialized_2m flag on all pages in that 2MiB range during memmap_init_zone. That makes the expand check test against the referenced page instead of having to go back to the 2MiB page. It appears to have added less than a second to the 2TiB boot so I hope it has equally little impact to the 16TiB boot. I was wrong. One of the two logs I looked at was the wrong one. Setting that Unitialized2m flag on all pages added 30 seconds to the 2TiB boot's memmap_init_zone(). Please disregard. That brings me back to the belief we need a better solution for the expand() path. Robin I will clean up this patch some more and resend the currently untested set later today. Thanks, Robin Robin On Tue, Jul 23, 2013 at 06:09:47AM -0500, Robin Holt wrote: On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote: * H. Peter Anvin h...@zytor.com wrote: On 07/15/2013 11:26 AM, Robin Holt wrote: Is there a fairly cheap way to determine definitively that the struct page is not initialized? By definition I would assume no. The only way I can think of would be to unmap the memory associated with the struct page in the TLB and initialize the struct pages at trap time. But ... the only fastpath impact I can see of delayed initialization right now is this piece of logic in prep_new_page(): @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, gfp_t gfp_flags) for (i = 0; i (1 order); i++) { struct page *p = page + i; + + if (PageUninitialized2Mib(p)) + expand_page_initialization(page); + if (unlikely(check_new_page(p))) return 1; That is where I think it can be made zero overhead in the already-initialized case, because page-flags are already used in check_new_page(): The problem I see here is that the page flags we need to check for the uninitialized flag are in the other page for the page aligned at the 2MiB virtual address, not the page currently being referenced. Let me try a version of the patch where we set the PG_unintialized_2m flag on all pages, including the aligned pages and see what that does to performance. Robin static inline int check_new_page(struct page *page) { if (unlikely(page_mapcount(page) | (page-mapping != NULL) | (atomic_read(page-_count) != 0) | (page-flags PAGE_FLAGS_CHECK_AT_PREP) | (mem_cgroup_bad_page_check(page { bad_page(page); return 1; see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every struct page on allocation. We can micro-optimize that low overhead to zero-overhead, by integrating the PageUninitialized2Mib() check into check_new_page(). This can be done by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing: if (unlikely(page-flags PAGE_FLAGS_CHECK_AT_PREP)) { if (PageUninitialized2Mib(p)) expand_page_initialization(page); ... } if (unlikely(page_mapcount(page) | (page-mapping != NULL) | (atomic_read(page-_count) != 0) | (mem_cgroup_bad_page_check(page { bad_page(page); return 1; this will result in making it essentially zero-overhead, the expand_page_initialization() logic is now in a slowpath. Am I missing anything here? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Mon, Jul 15, 2013 at 02:30:37PM -0700, Andrew Morton wrote: > On Thu, 11 Jul 2013 21:03:55 -0500 Robin Holt wrote: > > > During boot of large memory machines, a significant portion of boot > > is spent initializing the struct page array. The vast majority of > > those pages are not referenced during boot. > > > > Change this over to only initializing the pages when they are > > actually allocated. > > > > Besides the advantage of boot speed, this allows us the chance to > > use normal performance monitoring tools to determine where the bulk > > of time is spent during page initialization. > > > > ... > > > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page > > *page) > > __free_page(page); > > } > > > > +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end); > > + > > +static inline void __reserve_bootmem_page(struct page *page) > > +{ > > + phys_addr_t start = page_to_pfn(page) << PAGE_SHIFT; > > + phys_addr_t end = start + PAGE_SIZE; > > + > > + __reserve_bootmem_region(start, end); > > +} > > It isn't obvious that this needed to be inlined? It is being declared in a header file. All the other functions I came across in that header file are declared as inline (or __always_inline). It feels to me like this is right. Can I leave it as-is? > > > static inline void free_reserved_page(struct page *page) > > { > > + __reserve_bootmem_page(page); > > __free_reserved_page(page); > > adjust_managed_page_count(page, 1); > > } > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > > index 6d53675..79e8eb7 100644 > > --- a/include/linux/page-flags.h > > +++ b/include/linux/page-flags.h > > @@ -83,6 +83,7 @@ enum pageflags { > > PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/ > > PG_arch_1, > > PG_reserved, > > + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ > > "mib" creeps me out too. And it makes me think of SNMP, which I'd > prefer not to think about. > > We've traditionally had fears of running out of page flags, but I've > lost track of how close we are to that happening. IIRC the answer > depends on whether you believe there is such a thing as a 32-bit NUMA > system. > > Can this be avoided anyway? I suspect there's some idiotic combination > of flags we could use to indicate the state. PG_reserved|PG_lru or > something. > > "2MB" sounds terribly arch-specific. Shouldn't we make it more generic > for when the hexagon64 port wants to use 4MB? > > That conversational code comment was already commented on, but it's > still there? I am going to work on making it non-2m based over the course of this week, so expect the _2m (current name based on Yinghai's comments) to go away entirely. > > > > ... > > > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -740,6 +740,54 @@ static void __init_single_page(struct page *page, > > unsigned long zone, int nid, i > > #endif > > } > > > > +static void expand_page_initialization(struct page *basepage) > > +{ > > + unsigned long pfn = page_to_pfn(basepage); > > + unsigned long end_pfn = pfn + PTRS_PER_PMD; > > + unsigned long zone = page_zonenum(basepage); > > + int reserved = PageReserved(basepage); > > + int nid = page_to_nid(basepage); > > + > > + ClearPageUninitialized2Mib(basepage); > > + > > + for( pfn++; pfn < end_pfn; pfn++ ) > > + __init_single_page(pfn_to_page(pfn), zone, nid, reserved); > > +} > > + > > +void ensure_pages_are_initialized(unsigned long start_pfn, > > + unsigned long end_pfn) > > I think this can be made static. I hope so, as it's a somewhat > odd-sounding identifier for a global. Done. > > +{ > > + unsigned long aligned_start_pfn = start_pfn & ~(PTRS_PER_PMD - 1); > > + unsigned long aligned_end_pfn; > > + struct page *page; > > + > > + aligned_end_pfn = end_pfn & ~(PTRS_PER_PMD - 1); > > + aligned_end_pfn += PTRS_PER_PMD; > > + while (aligned_start_pfn < aligned_end_pfn) { > > + if (pfn_valid(aligned_start_pfn)) { > > + page = pfn_to_page(aligned_start_pfn); > > + > > + if(PageUninitialized2Mib(page)) > > checkpatch them, please. Will certainly do. > > + expand_page_initialization(page); > > + } > > + > > + aligned_start_pfn += PTRS_PER_PMD; > > + } > > +} > > Some nice code comments for the above two functions would be helpful. Will do. > > > > ... > > > > +int __meminit pfn_range_init_avail(unsigned long pfn, unsigned long > > end_pfn, > > + unsigned long size, int nid) > > +{ > > + unsigned long validate_end_pfn = pfn + size; > > + > > + if (pfn & (size - 1)) > > + return 1; > > + > > + if (pfn + size >= end_pfn) > > + return 1; > > + > > + while (pfn < validate_end_pfn) > > + { >
Re: [RFC 4/4] Sparse initialization of struct page array.
On Fri, Jul 12, 2013 at 09:19:12PM -0700, Yinghai Lu wrote: > On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt wrote: > > During boot of large memory machines, a significant portion of boot > > is spent initializing the struct page array. The vast majority of > > those pages are not referenced during boot. > > > > Change this over to only initializing the pages when they are > > actually allocated. > > > > Besides the advantage of boot speed, this allows us the chance to > > use normal performance monitoring tools to determine where the bulk > > of time is spent during page initialization. > > > > Signed-off-by: Robin Holt > > Signed-off-by: Nate Zimmer > > To: "H. Peter Anvin" > > To: Ingo Molnar > > Cc: Linux Kernel > > Cc: Linux MM > > Cc: Rob Landley > > Cc: Mike Travis > > Cc: Daniel J Blueman > > Cc: Andrew Morton > > Cc: Greg KH > > Cc: Yinghai Lu > > Cc: Mel Gorman > > --- > > include/linux/mm.h | 11 + > > include/linux/page-flags.h | 5 +- > > mm/nobootmem.c | 5 ++ > > mm/page_alloc.c| 117 > > +++-- > > 4 files changed, 132 insertions(+), 6 deletions(-) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index e0c8528..3de08b5 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page > > *page) > > __free_page(page); > > } > > > > +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end); > > + > > +static inline void __reserve_bootmem_page(struct page *page) > > +{ > > + phys_addr_t start = page_to_pfn(page) << PAGE_SHIFT; > > + phys_addr_t end = start + PAGE_SIZE; > > + > > + __reserve_bootmem_region(start, end); > > +} > > + > > static inline void free_reserved_page(struct page *page) > > { > > + __reserve_bootmem_page(page); > > __free_reserved_page(page); > > adjust_managed_page_count(page, 1); > > } > > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > > index 6d53675..79e8eb7 100644 > > --- a/include/linux/page-flags.h > > +++ b/include/linux/page-flags.h > > @@ -83,6 +83,7 @@ enum pageflags { > > PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/ > > PG_arch_1, > > PG_reserved, > > + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh > > */ > > PG_private, /* If pagecache, has fs-private data */ > > PG_private_2, /* If pagecache, has fs aux data */ > > PG_writeback, /* Page is under writeback */ > > @@ -211,6 +212,8 @@ PAGEFLAG(SwapBacked, swapbacked) > > __CLEARPAGEFLAG(SwapBacked, swapbacked) > > > > __PAGEFLAG(SlobFree, slob_free) > > > > +PAGEFLAG(Uninitialized2Mib, uninitialized2mib) > > + > > /* > > * Private page markings that may be used by the filesystem that owns the > > page > > * for its own purposes. > > @@ -499,7 +502,7 @@ static inline void ClearPageSlabPfmemalloc(struct page > > *page) > > #define PAGE_FLAGS_CHECK_AT_FREE \ > > (1 << PG_lru | 1 << PG_locked| \ > > 1 << PG_private | 1 << PG_private_2 | \ > > -1 << PG_writeback | 1 << PG_reserved | \ > > +1 << PG_writeback | 1 << PG_reserved | 1 << PG_uninitialized2mib | > > \ > > 1 << PG_slab| 1 << PG_swapcache | 1 << PG_active | \ > > 1 << PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \ > > __PG_COMPOUND_LOCK) > > diff --git a/mm/nobootmem.c b/mm/nobootmem.c > > index 3b512ca..e3a386d 100644 > > --- a/mm/nobootmem.c > > +++ b/mm/nobootmem.c > > @@ -126,6 +126,9 @@ static unsigned long __init > > free_low_memory_core_early(void) > > phys_addr_t start, end, size; > > u64 i; > > > > + for_each_reserved_mem_region(i, , ) > > + __reserve_bootmem_region(start, end); > > + > > How about holes that is not in memblock.reserved? > > before this patch: > free_area_init_node/free_area_init_core/memmap_init_zone > will mark all page in node range to Reserved in struct page, at first. > > but those holes is not mapped via kernel low mapping. > so it should be ok not touch "struct page" for them. > > Now you only mark reserved for memblock.reserved at first, and later > mark {memblock.memory} - { memblock.reserved} to be available. > And that is ok. > > but should split that change to another patch and add some comment > and change log for the change. > in case there is some user like UEFI etc do weird thing. I will split out a separate patch for this. > > for_each_free_mem_range(i, MAX_NUMNODES, , , NULL) > > count += __free_memory_core(start, end); > > > > @@ -162,6 +165,8 @@ unsigned long __init free_all_bootmem(void) > > { > > struct pglist_data *pgdat; > > > > + memblock_dump_all(); > > + > > Not needed. Left over debug in the rush to ask our question. > >
Re: [RFC 4/4] Sparse initialization of struct page array.
On Fri, Jul 12, 2013 at 09:19:12PM -0700, Yinghai Lu wrote: On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt h...@sgi.com wrote: During boot of large memory machines, a significant portion of boot is spent initializing the struct page array. The vast majority of those pages are not referenced during boot. Change this over to only initializing the pages when they are actually allocated. Besides the advantage of boot speed, this allows us the chance to use normal performance monitoring tools to determine where the bulk of time is spent during page initialization. Signed-off-by: Robin Holt h...@sgi.com Signed-off-by: Nate Zimmer nzim...@sgi.com To: H. Peter Anvin h...@zytor.com To: Ingo Molnar mi...@kernel.org Cc: Linux Kernel linux-kernel@vger.kernel.org Cc: Linux MM linux...@kvack.org Cc: Rob Landley r...@landley.net Cc: Mike Travis tra...@sgi.com Cc: Daniel J Blueman dan...@numascale-asia.com Cc: Andrew Morton a...@linux-foundation.org Cc: Greg KH gre...@linuxfoundation.org Cc: Yinghai Lu ying...@kernel.org Cc: Mel Gorman mgor...@suse.de --- include/linux/mm.h | 11 + include/linux/page-flags.h | 5 +- mm/nobootmem.c | 5 ++ mm/page_alloc.c| 117 +++-- 4 files changed, 132 insertions(+), 6 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e0c8528..3de08b5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page *page) __free_page(page); } +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end); + +static inline void __reserve_bootmem_page(struct page *page) +{ + phys_addr_t start = page_to_pfn(page) PAGE_SHIFT; + phys_addr_t end = start + PAGE_SIZE; + + __reserve_bootmem_region(start, end); +} + static inline void free_reserved_page(struct page *page) { + __reserve_bootmem_page(page); __free_reserved_page(page); adjust_managed_page_count(page, 1); } diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6d53675..79e8eb7 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -83,6 +83,7 @@ enum pageflags { PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/ PG_arch_1, PG_reserved, + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ PG_private, /* If pagecache, has fs-private data */ PG_private_2, /* If pagecache, has fs aux data */ PG_writeback, /* Page is under writeback */ @@ -211,6 +212,8 @@ PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked) __PAGEFLAG(SlobFree, slob_free) +PAGEFLAG(Uninitialized2Mib, uninitialized2mib) + /* * Private page markings that may be used by the filesystem that owns the page * for its own purposes. @@ -499,7 +502,7 @@ static inline void ClearPageSlabPfmemalloc(struct page *page) #define PAGE_FLAGS_CHECK_AT_FREE \ (1 PG_lru | 1 PG_locked| \ 1 PG_private | 1 PG_private_2 | \ -1 PG_writeback | 1 PG_reserved | \ +1 PG_writeback | 1 PG_reserved | 1 PG_uninitialized2mib | \ 1 PG_slab| 1 PG_swapcache | 1 PG_active | \ 1 PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \ __PG_COMPOUND_LOCK) diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 3b512ca..e3a386d 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -126,6 +126,9 @@ static unsigned long __init free_low_memory_core_early(void) phys_addr_t start, end, size; u64 i; + for_each_reserved_mem_region(i, start, end) + __reserve_bootmem_region(start, end); + How about holes that is not in memblock.reserved? before this patch: free_area_init_node/free_area_init_core/memmap_init_zone will mark all page in node range to Reserved in struct page, at first. but those holes is not mapped via kernel low mapping. so it should be ok not touch struct page for them. Now you only mark reserved for memblock.reserved at first, and later mark {memblock.memory} - { memblock.reserved} to be available. And that is ok. but should split that change to another patch and add some comment and change log for the change. in case there is some user like UEFI etc do weird thing. I will split out a separate patch for this. for_each_free_mem_range(i, MAX_NUMNODES, start, end, NULL) count += __free_memory_core(start, end); @@ -162,6 +165,8 @@ unsigned long __init free_all_bootmem(void) { struct pglist_data *pgdat; + memblock_dump_all(); + Not needed. Left over debug in the rush to ask our question.
Re: [RFC 4/4] Sparse initialization of struct page array.
On Mon, Jul 15, 2013 at 02:30:37PM -0700, Andrew Morton wrote: On Thu, 11 Jul 2013 21:03:55 -0500 Robin Holt h...@sgi.com wrote: During boot of large memory machines, a significant portion of boot is spent initializing the struct page array. The vast majority of those pages are not referenced during boot. Change this over to only initializing the pages when they are actually allocated. Besides the advantage of boot speed, this allows us the chance to use normal performance monitoring tools to determine where the bulk of time is spent during page initialization. ... --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page *page) __free_page(page); } +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end); + +static inline void __reserve_bootmem_page(struct page *page) +{ + phys_addr_t start = page_to_pfn(page) PAGE_SHIFT; + phys_addr_t end = start + PAGE_SIZE; + + __reserve_bootmem_region(start, end); +} It isn't obvious that this needed to be inlined? It is being declared in a header file. All the other functions I came across in that header file are declared as inline (or __always_inline). It feels to me like this is right. Can I leave it as-is? static inline void free_reserved_page(struct page *page) { + __reserve_bootmem_page(page); __free_reserved_page(page); adjust_managed_page_count(page, 1); } diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6d53675..79e8eb7 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -83,6 +83,7 @@ enum pageflags { PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/ PG_arch_1, PG_reserved, + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ mib creeps me out too. And it makes me think of SNMP, which I'd prefer not to think about. We've traditionally had fears of running out of page flags, but I've lost track of how close we are to that happening. IIRC the answer depends on whether you believe there is such a thing as a 32-bit NUMA system. Can this be avoided anyway? I suspect there's some idiotic combination of flags we could use to indicate the state. PG_reserved|PG_lru or something. 2MB sounds terribly arch-specific. Shouldn't we make it more generic for when the hexagon64 port wants to use 4MB? That conversational code comment was already commented on, but it's still there? I am going to work on making it non-2m based over the course of this week, so expect the _2m (current name based on Yinghai's comments) to go away entirely. ... --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -740,6 +740,54 @@ static void __init_single_page(struct page *page, unsigned long zone, int nid, i #endif } +static void expand_page_initialization(struct page *basepage) +{ + unsigned long pfn = page_to_pfn(basepage); + unsigned long end_pfn = pfn + PTRS_PER_PMD; + unsigned long zone = page_zonenum(basepage); + int reserved = PageReserved(basepage); + int nid = page_to_nid(basepage); + + ClearPageUninitialized2Mib(basepage); + + for( pfn++; pfn end_pfn; pfn++ ) + __init_single_page(pfn_to_page(pfn), zone, nid, reserved); +} + +void ensure_pages_are_initialized(unsigned long start_pfn, + unsigned long end_pfn) I think this can be made static. I hope so, as it's a somewhat odd-sounding identifier for a global. Done. +{ + unsigned long aligned_start_pfn = start_pfn ~(PTRS_PER_PMD - 1); + unsigned long aligned_end_pfn; + struct page *page; + + aligned_end_pfn = end_pfn ~(PTRS_PER_PMD - 1); + aligned_end_pfn += PTRS_PER_PMD; + while (aligned_start_pfn aligned_end_pfn) { + if (pfn_valid(aligned_start_pfn)) { + page = pfn_to_page(aligned_start_pfn); + + if(PageUninitialized2Mib(page)) checkpatch them, please. Will certainly do. + expand_page_initialization(page); + } + + aligned_start_pfn += PTRS_PER_PMD; + } +} Some nice code comments for the above two functions would be helpful. Will do. ... +int __meminit pfn_range_init_avail(unsigned long pfn, unsigned long end_pfn, + unsigned long size, int nid) +{ + unsigned long validate_end_pfn = pfn + size; + + if (pfn (size - 1)) + return 1; + + if (pfn + size = end_pfn) + return 1; + + while (pfn validate_end_pfn) + { + if (!early_pfn_valid(pfn)) + return 1; + if (!early_pfn_in_nid(pfn, nid)) + return 1; + pfn++; + } + + return size; +} Document it, please. The
Re: [RFC 4/4] Sparse initialization of struct page array.
On Thu, 11 Jul 2013 21:03:55 -0500 Robin Holt wrote: > During boot of large memory machines, a significant portion of boot > is spent initializing the struct page array. The vast majority of > those pages are not referenced during boot. > > Change this over to only initializing the pages when they are > actually allocated. > > Besides the advantage of boot speed, this allows us the chance to > use normal performance monitoring tools to determine where the bulk > of time is spent during page initialization. > > ... > > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page > *page) > __free_page(page); > } > > +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end); > + > +static inline void __reserve_bootmem_page(struct page *page) > +{ > + phys_addr_t start = page_to_pfn(page) << PAGE_SHIFT; > + phys_addr_t end = start + PAGE_SIZE; > + > + __reserve_bootmem_region(start, end); > +} It isn't obvious that this needed to be inlined? > static inline void free_reserved_page(struct page *page) > { > + __reserve_bootmem_page(page); > __free_reserved_page(page); > adjust_managed_page_count(page, 1); > } > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 6d53675..79e8eb7 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -83,6 +83,7 @@ enum pageflags { > PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/ > PG_arch_1, > PG_reserved, > + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ "mib" creeps me out too. And it makes me think of SNMP, which I'd prefer not to think about. We've traditionally had fears of running out of page flags, but I've lost track of how close we are to that happening. IIRC the answer depends on whether you believe there is such a thing as a 32-bit NUMA system. Can this be avoided anyway? I suspect there's some idiotic combination of flags we could use to indicate the state. PG_reserved|PG_lru or something. "2MB" sounds terribly arch-specific. Shouldn't we make it more generic for when the hexagon64 port wants to use 4MB? That conversational code comment was already commented on, but it's still there? > > ... > > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -740,6 +740,54 @@ static void __init_single_page(struct page *page, > unsigned long zone, int nid, i > #endif > } > > +static void expand_page_initialization(struct page *basepage) > +{ > + unsigned long pfn = page_to_pfn(basepage); > + unsigned long end_pfn = pfn + PTRS_PER_PMD; > + unsigned long zone = page_zonenum(basepage); > + int reserved = PageReserved(basepage); > + int nid = page_to_nid(basepage); > + > + ClearPageUninitialized2Mib(basepage); > + > + for( pfn++; pfn < end_pfn; pfn++ ) > + __init_single_page(pfn_to_page(pfn), zone, nid, reserved); > +} > + > +void ensure_pages_are_initialized(unsigned long start_pfn, > + unsigned long end_pfn) I think this can be made static. I hope so, as it's a somewhat odd-sounding identifier for a global. > +{ > + unsigned long aligned_start_pfn = start_pfn & ~(PTRS_PER_PMD - 1); > + unsigned long aligned_end_pfn; > + struct page *page; > + > + aligned_end_pfn = end_pfn & ~(PTRS_PER_PMD - 1); > + aligned_end_pfn += PTRS_PER_PMD; > + while (aligned_start_pfn < aligned_end_pfn) { > + if (pfn_valid(aligned_start_pfn)) { > + page = pfn_to_page(aligned_start_pfn); > + > + if(PageUninitialized2Mib(page)) checkpatch them, please. > + expand_page_initialization(page); > + } > + > + aligned_start_pfn += PTRS_PER_PMD; > + } > +} Some nice code comments for the above two functions would be helpful. > > ... > > +int __meminit pfn_range_init_avail(unsigned long pfn, unsigned long end_pfn, > +unsigned long size, int nid) > +{ > + unsigned long validate_end_pfn = pfn + size; > + > + if (pfn & (size - 1)) > + return 1; > + > + if (pfn + size >= end_pfn) > + return 1; > + > + while (pfn < validate_end_pfn) > + { > + if (!early_pfn_valid(pfn)) > + return 1; > + if (!early_pfn_in_nid(pfn, nid)) > + return 1; > + pfn++; > + } > + > + return size; > +} Document it, please. The return value semantics look odd, so don't forget to explain all that as well. > > ... > > @@ -6196,6 +6302,7 @@ static const struct trace_print_flags pageflag_names[] > = { > {1UL << PG_owner_priv_1,"owner_priv_1" }, > {1UL << PG_arch_1, "arch_1"}, > {1UL << PG_reserved,"reserved" }, > + {1UL << PG_uninitialized2mib,
Re: [RFC 4/4] Sparse initialization of struct page array.
On 07/15/2013 11:26 AM, Robin Holt wrote: > Is there a fairly cheap way to determine definitively that the struct > page is not initialized? By definition I would assume no. The only way I can think of would be to unmap the memory associated with the struct page in the TLB and initialize the struct pages at trap time. > I think this patch set can change fairly drastically if we have that. > I think I will start working up those changes and code a heavy-handed > check until I hear of an alternative way to cheaply check. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Mon, Jul 15, 2013 at 10:54:38AM -0700, H. Peter Anvin wrote: > On 07/15/2013 10:45 AM, Nathan Zimmer wrote: > > > > I hadn't actually been very happy with having a PG_uninitialized2mib flag. > > It implies if we want to jump to 1Gb pages we would need a second flag, > > PG_uninitialized1gb, for that. I was thinking of changing it to > > PG_uninitialized and setting page->private to the correct order. > > Thoughts? > > > > Seems straightforward. The bigger issue is the amount of overhead we > cause by having to check upstack for the initialization status of the > superpages. > > I'm concerned, obviously, about lingering overhead that is "forever". > That being said, in the absolutely worst case we could have a counter to > the number of uninitialized pages which when it hits zero we do a static > switch and switch out the initialization code (would have to be undone > on memory hotplug, of course.) Is there a fairly cheap way to determine definitively that the struct page is not initialized? I think this patch set can change fairly drastically if we have that. I think I will start working up those changes and code a heavy-handed check until I hear of an alternative way to cheaply check. Thanks, Robin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On 07/15/2013 10:45 AM, Nathan Zimmer wrote: > > I hadn't actually been very happy with having a PG_uninitialized2mib flag. > It implies if we want to jump to 1Gb pages we would need a second flag, > PG_uninitialized1gb, for that. I was thinking of changing it to > PG_uninitialized and setting page->private to the correct order. > Thoughts? > Seems straightforward. The bigger issue is the amount of overhead we cause by having to check upstack for the initialization status of the superpages. I'm concerned, obviously, about lingering overhead that is "forever". That being said, in the absolutely worst case we could have a counter to the number of uninitialized pages which when it hits zero we do a static switch and switch out the initialization code (would have to be undone on memory hotplug, of course.) -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Fri, Jul 12, 2013 at 09:19:12PM -0700, Yinghai Lu wrote: > On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt wrote: > > + > > page = pfn_to_page(pfn); > > __init_single_page(page, zone, nid, 1); > > + > > + if (pfns > 1) > > + SetPageUninitialized2Mib(page); > > + > > + pfn += pfns; > > } > > } > > > > @@ -6196,6 +6302,7 @@ static const struct trace_print_flags > > pageflag_names[] = { > > {1UL << PG_owner_priv_1,"owner_priv_1" }, > > {1UL << PG_arch_1, "arch_1"}, > > {1UL << PG_reserved,"reserved" }, > > + {1UL << PG_uninitialized2mib, "Uninit_2MiB" }, > > PG_uninitialized_2m ? > > > {1UL << PG_private, "private" }, > > {1UL << PG_private_2, "private_2" }, > > {1UL << PG_writeback, "writeback" }, > > Yinghai I hadn't actually been very happy with having a PG_uninitialized2mib flag. It implies if we want to jump to 1Gb pages we would need a second flag, PG_uninitialized1gb, for that. I was thinking of changing it to PG_uninitialized and setting page->private to the correct order. Thoughts? Nate -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On 07/13/2013 12:31 AM, Yinghai Lu wrote: On Fri, Jul 12, 2013 at 9:39 PM, H. Peter Anvin wrote: On 07/12/2013 09:19 PM, Yinghai Lu wrote: PG_reserved, + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ PG_private, /* If pagecache, has fs-private data */ The comment here is WTF... ntz: Nate Zimmer? rmh: Robin Holt? Yea that comment was supposed to be removed. Sorry about that. Nate -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On 07/13/2013 12:31 AM, Yinghai Lu wrote: On Fri, Jul 12, 2013 at 9:39 PM, H. Peter Anvin h...@zytor.com wrote: On 07/12/2013 09:19 PM, Yinghai Lu wrote: PG_reserved, + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ PG_private, /* If pagecache, has fs-private data */ The comment here is WTF... ntz: Nate Zimmer? rmh: Robin Holt? Yea that comment was supposed to be removed. Sorry about that. Nate -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Fri, Jul 12, 2013 at 09:19:12PM -0700, Yinghai Lu wrote: On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt h...@sgi.com wrote: + page = pfn_to_page(pfn); __init_single_page(page, zone, nid, 1); + + if (pfns 1) + SetPageUninitialized2Mib(page); + + pfn += pfns; } } @@ -6196,6 +6302,7 @@ static const struct trace_print_flags pageflag_names[] = { {1UL PG_owner_priv_1,owner_priv_1 }, {1UL PG_arch_1, arch_1}, {1UL PG_reserved,reserved }, + {1UL PG_uninitialized2mib, Uninit_2MiB }, PG_uninitialized_2m ? {1UL PG_private, private }, {1UL PG_private_2, private_2 }, {1UL PG_writeback, writeback }, Yinghai I hadn't actually been very happy with having a PG_uninitialized2mib flag. It implies if we want to jump to 1Gb pages we would need a second flag, PG_uninitialized1gb, for that. I was thinking of changing it to PG_uninitialized and setting page-private to the correct order. Thoughts? Nate -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On 07/15/2013 10:45 AM, Nathan Zimmer wrote: I hadn't actually been very happy with having a PG_uninitialized2mib flag. It implies if we want to jump to 1Gb pages we would need a second flag, PG_uninitialized1gb, for that. I was thinking of changing it to PG_uninitialized and setting page-private to the correct order. Thoughts? Seems straightforward. The bigger issue is the amount of overhead we cause by having to check upstack for the initialization status of the superpages. I'm concerned, obviously, about lingering overhead that is forever. That being said, in the absolutely worst case we could have a counter to the number of uninitialized pages which when it hits zero we do a static switch and switch out the initialization code (would have to be undone on memory hotplug, of course.) -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Mon, Jul 15, 2013 at 10:54:38AM -0700, H. Peter Anvin wrote: On 07/15/2013 10:45 AM, Nathan Zimmer wrote: I hadn't actually been very happy with having a PG_uninitialized2mib flag. It implies if we want to jump to 1Gb pages we would need a second flag, PG_uninitialized1gb, for that. I was thinking of changing it to PG_uninitialized and setting page-private to the correct order. Thoughts? Seems straightforward. The bigger issue is the amount of overhead we cause by having to check upstack for the initialization status of the superpages. I'm concerned, obviously, about lingering overhead that is forever. That being said, in the absolutely worst case we could have a counter to the number of uninitialized pages which when it hits zero we do a static switch and switch out the initialization code (would have to be undone on memory hotplug, of course.) Is there a fairly cheap way to determine definitively that the struct page is not initialized? I think this patch set can change fairly drastically if we have that. I think I will start working up those changes and code a heavy-handed check until I hear of an alternative way to cheaply check. Thanks, Robin -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On 07/15/2013 11:26 AM, Robin Holt wrote: Is there a fairly cheap way to determine definitively that the struct page is not initialized? By definition I would assume no. The only way I can think of would be to unmap the memory associated with the struct page in the TLB and initialize the struct pages at trap time. I think this patch set can change fairly drastically if we have that. I think I will start working up those changes and code a heavy-handed check until I hear of an alternative way to cheaply check. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Thu, 11 Jul 2013 21:03:55 -0500 Robin Holt h...@sgi.com wrote: During boot of large memory machines, a significant portion of boot is spent initializing the struct page array. The vast majority of those pages are not referenced during boot. Change this over to only initializing the pages when they are actually allocated. Besides the advantage of boot speed, this allows us the chance to use normal performance monitoring tools to determine where the bulk of time is spent during page initialization. ... --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page *page) __free_page(page); } +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end); + +static inline void __reserve_bootmem_page(struct page *page) +{ + phys_addr_t start = page_to_pfn(page) PAGE_SHIFT; + phys_addr_t end = start + PAGE_SIZE; + + __reserve_bootmem_region(start, end); +} It isn't obvious that this needed to be inlined? static inline void free_reserved_page(struct page *page) { + __reserve_bootmem_page(page); __free_reserved_page(page); adjust_managed_page_count(page, 1); } diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6d53675..79e8eb7 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -83,6 +83,7 @@ enum pageflags { PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/ PG_arch_1, PG_reserved, + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ mib creeps me out too. And it makes me think of SNMP, which I'd prefer not to think about. We've traditionally had fears of running out of page flags, but I've lost track of how close we are to that happening. IIRC the answer depends on whether you believe there is such a thing as a 32-bit NUMA system. Can this be avoided anyway? I suspect there's some idiotic combination of flags we could use to indicate the state. PG_reserved|PG_lru or something. 2MB sounds terribly arch-specific. Shouldn't we make it more generic for when the hexagon64 port wants to use 4MB? That conversational code comment was already commented on, but it's still there? ... --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -740,6 +740,54 @@ static void __init_single_page(struct page *page, unsigned long zone, int nid, i #endif } +static void expand_page_initialization(struct page *basepage) +{ + unsigned long pfn = page_to_pfn(basepage); + unsigned long end_pfn = pfn + PTRS_PER_PMD; + unsigned long zone = page_zonenum(basepage); + int reserved = PageReserved(basepage); + int nid = page_to_nid(basepage); + + ClearPageUninitialized2Mib(basepage); + + for( pfn++; pfn end_pfn; pfn++ ) + __init_single_page(pfn_to_page(pfn), zone, nid, reserved); +} + +void ensure_pages_are_initialized(unsigned long start_pfn, + unsigned long end_pfn) I think this can be made static. I hope so, as it's a somewhat odd-sounding identifier for a global. +{ + unsigned long aligned_start_pfn = start_pfn ~(PTRS_PER_PMD - 1); + unsigned long aligned_end_pfn; + struct page *page; + + aligned_end_pfn = end_pfn ~(PTRS_PER_PMD - 1); + aligned_end_pfn += PTRS_PER_PMD; + while (aligned_start_pfn aligned_end_pfn) { + if (pfn_valid(aligned_start_pfn)) { + page = pfn_to_page(aligned_start_pfn); + + if(PageUninitialized2Mib(page)) checkpatch them, please. + expand_page_initialization(page); + } + + aligned_start_pfn += PTRS_PER_PMD; + } +} Some nice code comments for the above two functions would be helpful. ... +int __meminit pfn_range_init_avail(unsigned long pfn, unsigned long end_pfn, +unsigned long size, int nid) +{ + unsigned long validate_end_pfn = pfn + size; + + if (pfn (size - 1)) + return 1; + + if (pfn + size = end_pfn) + return 1; + + while (pfn validate_end_pfn) + { + if (!early_pfn_valid(pfn)) + return 1; + if (!early_pfn_in_nid(pfn, nid)) + return 1; + pfn++; + } + + return size; +} Document it, please. The return value semantics look odd, so don't forget to explain all that as well. ... @@ -6196,6 +6302,7 @@ static const struct trace_print_flags pageflag_names[] = { {1UL PG_owner_priv_1,owner_priv_1 }, {1UL PG_arch_1, arch_1}, {1UL PG_reserved,reserved }, + {1UL PG_uninitialized2mib, Uninit_2MiB }, It would be better if the name which is visible in procfs matches the name in the kernel source code. {1UL PG_private,
Re: [RFC 4/4] Sparse initialization of struct page array.
On 07/12/2013 10:31 PM, Yinghai Lu wrote: > On Fri, Jul 12, 2013 at 9:39 PM, H. Peter Anvin wrote: >> On 07/12/2013 09:19 PM, Yinghai Lu wrote: PG_reserved, + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ PG_private, /* If pagecache, has fs-private data */ >> >> The comment here is WTF... > > ntz: Nate Zimmer? > rmh: Robin Holt? > This kind of conversation doesn't really belong in a code comment, though. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Fri, Jul 12, 2013 at 9:39 PM, H. Peter Anvin wrote: > On 07/12/2013 09:19 PM, Yinghai Lu wrote: >>> PG_reserved, >>> + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh >>> */ >>> PG_private, /* If pagecache, has fs-private data */ > > The comment here is WTF... ntz: Nate Zimmer? rmh: Robin Holt? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On 07/12/2013 09:19 PM, Yinghai Lu wrote: >> PG_reserved, >> + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ >> PG_private, /* If pagecache, has fs-private data */ The comment here is WTF... -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt wrote: > During boot of large memory machines, a significant portion of boot > is spent initializing the struct page array. The vast majority of > those pages are not referenced during boot. > > Change this over to only initializing the pages when they are > actually allocated. > > Besides the advantage of boot speed, this allows us the chance to > use normal performance monitoring tools to determine where the bulk > of time is spent during page initialization. > > Signed-off-by: Robin Holt > Signed-off-by: Nate Zimmer > To: "H. Peter Anvin" > To: Ingo Molnar > Cc: Linux Kernel > Cc: Linux MM > Cc: Rob Landley > Cc: Mike Travis > Cc: Daniel J Blueman > Cc: Andrew Morton > Cc: Greg KH > Cc: Yinghai Lu > Cc: Mel Gorman > --- > include/linux/mm.h | 11 + > include/linux/page-flags.h | 5 +- > mm/nobootmem.c | 5 ++ > mm/page_alloc.c| 117 > +++-- > 4 files changed, 132 insertions(+), 6 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index e0c8528..3de08b5 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page > *page) > __free_page(page); > } > > +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end); > + > +static inline void __reserve_bootmem_page(struct page *page) > +{ > + phys_addr_t start = page_to_pfn(page) << PAGE_SHIFT; > + phys_addr_t end = start + PAGE_SIZE; > + > + __reserve_bootmem_region(start, end); > +} > + > static inline void free_reserved_page(struct page *page) > { > + __reserve_bootmem_page(page); > __free_reserved_page(page); > adjust_managed_page_count(page, 1); > } > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 6d53675..79e8eb7 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -83,6 +83,7 @@ enum pageflags { > PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/ > PG_arch_1, > PG_reserved, > + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ > PG_private, /* If pagecache, has fs-private data */ > PG_private_2, /* If pagecache, has fs aux data */ > PG_writeback, /* Page is under writeback */ > @@ -211,6 +212,8 @@ PAGEFLAG(SwapBacked, swapbacked) > __CLEARPAGEFLAG(SwapBacked, swapbacked) > > __PAGEFLAG(SlobFree, slob_free) > > +PAGEFLAG(Uninitialized2Mib, uninitialized2mib) > + > /* > * Private page markings that may be used by the filesystem that owns the > page > * for its own purposes. > @@ -499,7 +502,7 @@ static inline void ClearPageSlabPfmemalloc(struct page > *page) > #define PAGE_FLAGS_CHECK_AT_FREE \ > (1 << PG_lru | 1 << PG_locked| \ > 1 << PG_private | 1 << PG_private_2 | \ > -1 << PG_writeback | 1 << PG_reserved | \ > +1 << PG_writeback | 1 << PG_reserved | 1 << PG_uninitialized2mib | \ > 1 << PG_slab| 1 << PG_swapcache | 1 << PG_active | \ > 1 << PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \ > __PG_COMPOUND_LOCK) > diff --git a/mm/nobootmem.c b/mm/nobootmem.c > index 3b512ca..e3a386d 100644 > --- a/mm/nobootmem.c > +++ b/mm/nobootmem.c > @@ -126,6 +126,9 @@ static unsigned long __init > free_low_memory_core_early(void) > phys_addr_t start, end, size; > u64 i; > > + for_each_reserved_mem_region(i, , ) > + __reserve_bootmem_region(start, end); > + How about holes that is not in memblock.reserved? before this patch: free_area_init_node/free_area_init_core/memmap_init_zone will mark all page in node range to Reserved in struct page, at first. but those holes is not mapped via kernel low mapping. so it should be ok not touch "struct page" for them. Now you only mark reserved for memblock.reserved at first, and later mark {memblock.memory} - { memblock.reserved} to be available. And that is ok. but should split that change to another patch and add some comment and change log for the change. in case there is some user like UEFI etc do weird thing. > for_each_free_mem_range(i, MAX_NUMNODES, , , NULL) > count += __free_memory_core(start, end); > > @@ -162,6 +165,8 @@ unsigned long __init free_all_bootmem(void) > { > struct pglist_data *pgdat; > > + memblock_dump_all(); > + Not needed. > for_each_online_pgdat(pgdat) > reset_node_lowmem_managed_pages(pgdat); > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 635b131..fe51eb9 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -740,6 +740,54 @@ static void __init_single_page(struct page *page, > unsigned long zone, int nid, i > #endif > } > > +static void expand_page_initialization(struct page *basepage) > +{ > + unsigned long
Re: [RFC 4/4] Sparse initialization of struct page array.
On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt h...@sgi.com wrote: During boot of large memory machines, a significant portion of boot is spent initializing the struct page array. The vast majority of those pages are not referenced during boot. Change this over to only initializing the pages when they are actually allocated. Besides the advantage of boot speed, this allows us the chance to use normal performance monitoring tools to determine where the bulk of time is spent during page initialization. Signed-off-by: Robin Holt h...@sgi.com Signed-off-by: Nate Zimmer nzim...@sgi.com To: H. Peter Anvin h...@zytor.com To: Ingo Molnar mi...@kernel.org Cc: Linux Kernel linux-kernel@vger.kernel.org Cc: Linux MM linux...@kvack.org Cc: Rob Landley r...@landley.net Cc: Mike Travis tra...@sgi.com Cc: Daniel J Blueman dan...@numascale-asia.com Cc: Andrew Morton a...@linux-foundation.org Cc: Greg KH gre...@linuxfoundation.org Cc: Yinghai Lu ying...@kernel.org Cc: Mel Gorman mgor...@suse.de --- include/linux/mm.h | 11 + include/linux/page-flags.h | 5 +- mm/nobootmem.c | 5 ++ mm/page_alloc.c| 117 +++-- 4 files changed, 132 insertions(+), 6 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e0c8528..3de08b5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page *page) __free_page(page); } +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end); + +static inline void __reserve_bootmem_page(struct page *page) +{ + phys_addr_t start = page_to_pfn(page) PAGE_SHIFT; + phys_addr_t end = start + PAGE_SIZE; + + __reserve_bootmem_region(start, end); +} + static inline void free_reserved_page(struct page *page) { + __reserve_bootmem_page(page); __free_reserved_page(page); adjust_managed_page_count(page, 1); } diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 6d53675..79e8eb7 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -83,6 +83,7 @@ enum pageflags { PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/ PG_arch_1, PG_reserved, + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ PG_private, /* If pagecache, has fs-private data */ PG_private_2, /* If pagecache, has fs aux data */ PG_writeback, /* Page is under writeback */ @@ -211,6 +212,8 @@ PAGEFLAG(SwapBacked, swapbacked) __CLEARPAGEFLAG(SwapBacked, swapbacked) __PAGEFLAG(SlobFree, slob_free) +PAGEFLAG(Uninitialized2Mib, uninitialized2mib) + /* * Private page markings that may be used by the filesystem that owns the page * for its own purposes. @@ -499,7 +502,7 @@ static inline void ClearPageSlabPfmemalloc(struct page *page) #define PAGE_FLAGS_CHECK_AT_FREE \ (1 PG_lru | 1 PG_locked| \ 1 PG_private | 1 PG_private_2 | \ -1 PG_writeback | 1 PG_reserved | \ +1 PG_writeback | 1 PG_reserved | 1 PG_uninitialized2mib | \ 1 PG_slab| 1 PG_swapcache | 1 PG_active | \ 1 PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \ __PG_COMPOUND_LOCK) diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 3b512ca..e3a386d 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -126,6 +126,9 @@ static unsigned long __init free_low_memory_core_early(void) phys_addr_t start, end, size; u64 i; + for_each_reserved_mem_region(i, start, end) + __reserve_bootmem_region(start, end); + How about holes that is not in memblock.reserved? before this patch: free_area_init_node/free_area_init_core/memmap_init_zone will mark all page in node range to Reserved in struct page, at first. but those holes is not mapped via kernel low mapping. so it should be ok not touch struct page for them. Now you only mark reserved for memblock.reserved at first, and later mark {memblock.memory} - { memblock.reserved} to be available. And that is ok. but should split that change to another patch and add some comment and change log for the change. in case there is some user like UEFI etc do weird thing. for_each_free_mem_range(i, MAX_NUMNODES, start, end, NULL) count += __free_memory_core(start, end); @@ -162,6 +165,8 @@ unsigned long __init free_all_bootmem(void) { struct pglist_data *pgdat; + memblock_dump_all(); + Not needed. for_each_online_pgdat(pgdat) reset_node_lowmem_managed_pages(pgdat); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 635b131..fe51eb9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -740,6 +740,54 @@ static void __init_single_page(struct page *page, unsigned long zone, int
Re: [RFC 4/4] Sparse initialization of struct page array.
On 07/12/2013 09:19 PM, Yinghai Lu wrote: PG_reserved, + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ PG_private, /* If pagecache, has fs-private data */ The comment here is WTF... -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On Fri, Jul 12, 2013 at 9:39 PM, H. Peter Anvin h...@zytor.com wrote: On 07/12/2013 09:19 PM, Yinghai Lu wrote: PG_reserved, + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ PG_private, /* If pagecache, has fs-private data */ The comment here is WTF... ntz: Nate Zimmer? rmh: Robin Holt? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC 4/4] Sparse initialization of struct page array.
On 07/12/2013 10:31 PM, Yinghai Lu wrote: On Fri, Jul 12, 2013 at 9:39 PM, H. Peter Anvin h...@zytor.com wrote: On 07/12/2013 09:19 PM, Yinghai Lu wrote: PG_reserved, + PG_uninitialized2mib, /* Is this the right spot? ntz - Yes - rmh */ PG_private, /* If pagecache, has fs-private data */ The comment here is WTF... ntz: Nate Zimmer? rmh: Robin Holt? This kind of conversation doesn't really belong in a code comment, though. -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/