subject:"Re\: \[RFC 4\/4\] Sparse initialization of struct page array."

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-25 Thread Yinghai Lu

On Thu, Jul 25, 2013 at 6:42 AM, Robin Holt  wrote:
> On Thu, Jul 25, 2013 at 05:50:57AM -0700, Yinghai Lu wrote:
>> On Wed, Jul 24, 2013 at 7:25 PM, Robin Holt  wrote:
>> >>
>> >> How about holes that is not in memblock.reserved?
>> >>
>> >> before this patch:
>> >> free_area_init_node/free_area_init_core/memmap_init_zone
>> >> will mark all page in node range to Reserved in struct page, at first.
>> >>
>> >> but those holes is not mapped via kernel low mapping.
>> >> so it should be ok not touch "struct page" for them.
>> >>
>> >> Now you only mark reserved for memblock.reserved at first, and later
>> >> mark {memblock.memory} - { memblock.reserved} to be available.
>> >> And that is ok.
>> >>
>> >> but should split that change to another patch and add some comment
>> >> and change log for the change.
>> >> in case there is some user like UEFI etc do weird thing.
>> >
>> > Nate and I talked this over today.  Sorry for the delay, but it was the
>> > first time we were both free.  Neither of us quite understands what you
>> > are asking for here.  My interpretation is that you would like us to
>> > change the use of the PageReserved flag such that during boot, we do not
>> > set the flag at all from memmap_init_zone, and then only set it on pages
>> > which, at the time of free_all_bootmem, have been allocated/reserved in
>> > the memblock allocator.  Is that correct?  I will start to work that up
>> > on the assumption that is what you are asking for.
>>
>> Not exactly.
>>
>> your change should be right, but there is some subtle change about
>> holes handling.
>>
>> before mem holes between memory ranges in memblock.memory, get struct page,
>> and initialized with to Reserved in memmap_init_zone.
>> Those holes is not in memory.reserved, with your patches, those hole's
>> struct page
>> will still have all 0.
>>
>> Please separate change about set page to reserved according to 
>> memory.reserved
>> to another patch.
>
>
> Just want to make sure this is where you want me to go.  Here is my
> currently untested patch.  Is that what you were expecting to have done?
> One thing I don't like about this patch is it seems to slow down boot in
> my simulator environment.  I think I am going to look at restructuring
> things a bit to see if I can eliminate that performance penalty.
> Otherwise, I think I am following your directions.

>
> From bdd2fefa59af18f283af6f066bc644ddfa5c7da8 Mon Sep 17 00:00:00 2001
> From: Robin Holt 
> Date: Thu, 25 Jul 2013 04:25:15 -0500
> Subject: [RFC -v2-pre2 4/5] ZZZ Only SegPageReserved() on memblock reserved
>  pages.

yes.

>
> ---
>  include/linux/mm.h |  2 ++
>  mm/nobootmem.c |  3 +++
>  mm/page_alloc.c| 18 +++---
>  3 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e0c8528..b264a26 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1322,6 +1322,8 @@ static inline void adjust_managed_page_count(struct 
> page *page, long count)
> totalram_pages += count;
>  }
>
> +extern void reserve_bootmem_region(unsigned long start, unsigned long end);
> +
>  /* Free the reserved page into the buddy system, so it gets managed. */
>  static inline void __free_reserved_page(struct page *page)
>  {
> diff --git a/mm/nobootmem.c b/mm/nobootmem.c
> index 2159e68..0840af2 100644
> --- a/mm/nobootmem.c
> +++ b/mm/nobootmem.c
> @@ -117,6 +117,9 @@ static unsigned long __init 
> free_low_memory_core_early(void)
> phys_addr_t start, end, size;
> u64 i;
>
> +   for_each_reserved_mem_region(i, , )
> +   reserve_bootmem_region(start, end);
> +
> for_each_free_mem_range(i, MAX_NUMNODES, , , NULL)
> count += __free_memory_core(start, end);
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 048e166..3aa30b7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -698,7 +698,7 @@ static void free_one_page(struct zone *zone, struct page 
> *page, int order,
>  }
>
>  static void __init_single_page(unsigned long pfn, unsigned long zone,
> -  int nid, int reserved)
> +  int nid, int page_count)
>  {
> struct page *page = pfn_to_page(pfn);
> struct zone *z = _DATA(nid)->node_zones[zone];
> @@ -708,12 +708,9 @@ static void __init_single_page(unsigned long pfn, 
> unsigned long zone,
> init_page_count(page);
> page_mapcount_reset(page);
> page_nid_reset_last(page);
> -   if (reserved) {
> -   SetPageReserved(page);
> -   } else {
> -   ClearPageReserved(page);
> -   set_page_count(page, 0);
> -   }
> +   ClearPageReserved(page);
> +   set_page_count(page, page_count);
> +
> /*
>  * Mark the block movable so that blocks are reserved for
>  * movable at startup. This will force kernel allocations
> @@ -741,6 +738,13 @@ static void __init_single_page(unsigned long

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-25 Thread Robin Holt

On Thu, Jul 25, 2013 at 05:50:57AM -0700, Yinghai Lu wrote:
> On Wed, Jul 24, 2013 at 7:25 PM, Robin Holt  wrote:
> >>
> >> How about holes that is not in memblock.reserved?
> >>
> >> before this patch:
> >> free_area_init_node/free_area_init_core/memmap_init_zone
> >> will mark all page in node range to Reserved in struct page, at first.
> >>
> >> but those holes is not mapped via kernel low mapping.
> >> so it should be ok not touch "struct page" for them.
> >>
> >> Now you only mark reserved for memblock.reserved at first, and later
> >> mark {memblock.memory} - { memblock.reserved} to be available.
> >> And that is ok.
> >>
> >> but should split that change to another patch and add some comment
> >> and change log for the change.
> >> in case there is some user like UEFI etc do weird thing.
> >
> > Nate and I talked this over today.  Sorry for the delay, but it was the
> > first time we were both free.  Neither of us quite understands what you
> > are asking for here.  My interpretation is that you would like us to
> > change the use of the PageReserved flag such that during boot, we do not
> > set the flag at all from memmap_init_zone, and then only set it on pages
> > which, at the time of free_all_bootmem, have been allocated/reserved in
> > the memblock allocator.  Is that correct?  I will start to work that up
> > on the assumption that is what you are asking for.
> 
> Not exactly.
> 
> your change should be right, but there is some subtle change about
> holes handling.
> 
> before mem holes between memory ranges in memblock.memory, get struct page,
> and initialized with to Reserved in memmap_init_zone.
> Those holes is not in memory.reserved, with your patches, those hole's
> struct page
> will still have all 0.
> 
> Please separate change about set page to reserved according to memory.reserved
> to another patch.


Just want to make sure this is where you want me to go.  Here is my
currently untested patch.  Is that what you were expecting to have done?
One thing I don't like about this patch is it seems to slow down boot in
my simulator environment.  I think I am going to look at restructuring
things a bit to see if I can eliminate that performance penalty.
Otherwise, I think I am following your directions.

Thanks,
Robin Holt

>From bdd2fefa59af18f283af6f066bc644ddfa5c7da8 Mon Sep 17 00:00:00 2001
From: Robin Holt 
Date: Thu, 25 Jul 2013 04:25:15 -0500
Subject: [RFC -v2-pre2 4/5] ZZZ Only SegPageReserved() on memblock reserved
 pages.

---
 include/linux/mm.h |  2 ++
 mm/nobootmem.c |  3 +++
 mm/page_alloc.c| 18 +++---
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e0c8528..b264a26 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1322,6 +1322,8 @@ static inline void adjust_managed_page_count(struct page 
*page, long count)
totalram_pages += count;
 }
 
+extern void reserve_bootmem_region(unsigned long start, unsigned long end);
+
 /* Free the reserved page into the buddy system, so it gets managed. */
 static inline void __free_reserved_page(struct page *page)
 {
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 2159e68..0840af2 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -117,6 +117,9 @@ static unsigned long __init free_low_memory_core_early(void)
phys_addr_t start, end, size;
u64 i;
 
+   for_each_reserved_mem_region(i, , )
+   reserve_bootmem_region(start, end);
+
for_each_free_mem_range(i, MAX_NUMNODES, , , NULL)
count += __free_memory_core(start, end);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 048e166..3aa30b7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -698,7 +698,7 @@ static void free_one_page(struct zone *zone, struct page 
*page, int order,
 }
 
 static void __init_single_page(unsigned long pfn, unsigned long zone,
-  int nid, int reserved)
+  int nid, int page_count)
 {
struct page *page = pfn_to_page(pfn);
struct zone *z = _DATA(nid)->node_zones[zone];
@@ -708,12 +708,9 @@ static void __init_single_page(unsigned long pfn, unsigned 
long zone,
init_page_count(page);
page_mapcount_reset(page);
page_nid_reset_last(page);
-   if (reserved) {
-   SetPageReserved(page);
-   } else {
-   ClearPageReserved(page);
-   set_page_count(page, 0);
-   }
+   ClearPageReserved(page);
+   set_page_count(page, page_count);
+
/*
 * Mark the block movable so that blocks are reserved for
 * movable at startup. This will force kernel allocations
@@ -741,6 +738,13 @@ static void __init_single_page(unsigned long pfn, unsigned 
long zone,
 #endif
 }
 
+void reserve_bootmem_region(unsigned long start, unsigned long end)
+{
+   for (; start < end; start++)
+   if (pfn_valid(start))
+

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-25 Thread Yinghai Lu

On Wed, Jul 24, 2013 at 7:25 PM, Robin Holt  wrote:
>>
>> How about holes that is not in memblock.reserved?
>>
>> before this patch:
>> free_area_init_node/free_area_init_core/memmap_init_zone
>> will mark all page in node range to Reserved in struct page, at first.
>>
>> but those holes is not mapped via kernel low mapping.
>> so it should be ok not touch "struct page" for them.
>>
>> Now you only mark reserved for memblock.reserved at first, and later
>> mark {memblock.memory} - { memblock.reserved} to be available.
>> And that is ok.
>>
>> but should split that change to another patch and add some comment
>> and change log for the change.
>> in case there is some user like UEFI etc do weird thing.
>
> Nate and I talked this over today.  Sorry for the delay, but it was the
> first time we were both free.  Neither of us quite understands what you
> are asking for here.  My interpretation is that you would like us to
> change the use of the PageReserved flag such that during boot, we do not
> set the flag at all from memmap_init_zone, and then only set it on pages
> which, at the time of free_all_bootmem, have been allocated/reserved in
> the memblock allocator.  Is that correct?  I will start to work that up
> on the assumption that is what you are asking for.

Not exactly.

your change should be right, but there is some subtle change about
holes handling.

before mem holes between memory ranges in memblock.memory, get struct page,
and initialized with to Reserved in memmap_init_zone.
Those holes is not in memory.reserved, with your patches, those hole's
struct page
will still have all 0.

Please separate change about set page to reserved according to memory.reserved
to another patch.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-25 Thread Yinghai Lu

On Wed, Jul 24, 2013 at 7:25 PM, Robin Holt h...@sgi.com wrote:

 How about holes that is not in memblock.reserved?

 before this patch:
 free_area_init_node/free_area_init_core/memmap_init_zone
 will mark all page in node range to Reserved in struct page, at first.

 but those holes is not mapped via kernel low mapping.
 so it should be ok not touch struct page for them.

 Now you only mark reserved for memblock.reserved at first, and later
 mark {memblock.memory} - { memblock.reserved} to be available.
 And that is ok.

 but should split that change to another patch and add some comment
 and change log for the change.
 in case there is some user like UEFI etc do weird thing.

 Nate and I talked this over today.  Sorry for the delay, but it was the
 first time we were both free.  Neither of us quite understands what you
 are asking for here.  My interpretation is that you would like us to
 change the use of the PageReserved flag such that during boot, we do not
 set the flag at all from memmap_init_zone, and then only set it on pages
 which, at the time of free_all_bootmem, have been allocated/reserved in
 the memblock allocator.  Is that correct?  I will start to work that up
 on the assumption that is what you are asking for.

Not exactly.

your change should be right, but there is some subtle change about
holes handling.

before mem holes between memory ranges in memblock.memory, get struct page,
and initialized with to Reserved in memmap_init_zone.
Those holes is not in memory.reserved, with your patches, those hole's
struct page
will still have all 0.

Please separate change about set page to reserved according to memory.reserved
to another patch.

Thanks

Yinghai
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-25 Thread Robin Holt

On Thu, Jul 25, 2013 at 05:50:57AM -0700, Yinghai Lu wrote:
 On Wed, Jul 24, 2013 at 7:25 PM, Robin Holt h...@sgi.com wrote:
 
  How about holes that is not in memblock.reserved?
 
  before this patch:
  free_area_init_node/free_area_init_core/memmap_init_zone
  will mark all page in node range to Reserved in struct page, at first.
 
  but those holes is not mapped via kernel low mapping.
  so it should be ok not touch struct page for them.
 
  Now you only mark reserved for memblock.reserved at first, and later
  mark {memblock.memory} - { memblock.reserved} to be available.
  And that is ok.
 
  but should split that change to another patch and add some comment
  and change log for the change.
  in case there is some user like UEFI etc do weird thing.
 
  Nate and I talked this over today.  Sorry for the delay, but it was the
  first time we were both free.  Neither of us quite understands what you
  are asking for here.  My interpretation is that you would like us to
  change the use of the PageReserved flag such that during boot, we do not
  set the flag at all from memmap_init_zone, and then only set it on pages
  which, at the time of free_all_bootmem, have been allocated/reserved in
  the memblock allocator.  Is that correct?  I will start to work that up
  on the assumption that is what you are asking for.
 
 Not exactly.
 
 your change should be right, but there is some subtle change about
 holes handling.
 
 before mem holes between memory ranges in memblock.memory, get struct page,
 and initialized with to Reserved in memmap_init_zone.
 Those holes is not in memory.reserved, with your patches, those hole's
 struct page
 will still have all 0.
 
 Please separate change about set page to reserved according to memory.reserved
 to another patch.


Just want to make sure this is where you want me to go.  Here is my
currently untested patch.  Is that what you were expecting to have done?
One thing I don't like about this patch is it seems to slow down boot in
my simulator environment.  I think I am going to look at restructuring
things a bit to see if I can eliminate that performance penalty.
Otherwise, I think I am following your directions.

Thanks,
Robin Holt

From bdd2fefa59af18f283af6f066bc644ddfa5c7da8 Mon Sep 17 00:00:00 2001
From: Robin Holt h...@sgi.com
Date: Thu, 25 Jul 2013 04:25:15 -0500
Subject: [RFC -v2-pre2 4/5] ZZZ Only SegPageReserved() on memblock reserved
 pages.

---
 include/linux/mm.h |  2 ++
 mm/nobootmem.c |  3 +++
 mm/page_alloc.c| 18 +++---
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e0c8528..b264a26 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1322,6 +1322,8 @@ static inline void adjust_managed_page_count(struct page 
*page, long count)
totalram_pages += count;
 }
 
+extern void reserve_bootmem_region(unsigned long start, unsigned long end);
+
 /* Free the reserved page into the buddy system, so it gets managed. */
 static inline void __free_reserved_page(struct page *page)
 {
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 2159e68..0840af2 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -117,6 +117,9 @@ static unsigned long __init free_low_memory_core_early(void)
phys_addr_t start, end, size;
u64 i;
 
+   for_each_reserved_mem_region(i, start, end)
+   reserve_bootmem_region(start, end);
+
for_each_free_mem_range(i, MAX_NUMNODES, start, end, NULL)
count += __free_memory_core(start, end);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 048e166..3aa30b7 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -698,7 +698,7 @@ static void free_one_page(struct zone *zone, struct page 
*page, int order,
 }
 
 static void __init_single_page(unsigned long pfn, unsigned long zone,
-  int nid, int reserved)
+  int nid, int page_count)
 {
struct page *page = pfn_to_page(pfn);
struct zone *z = NODE_DATA(nid)-node_zones[zone];
@@ -708,12 +708,9 @@ static void __init_single_page(unsigned long pfn, unsigned 
long zone,
init_page_count(page);
page_mapcount_reset(page);
page_nid_reset_last(page);
-   if (reserved) {
-   SetPageReserved(page);
-   } else {
-   ClearPageReserved(page);
-   set_page_count(page, 0);
-   }
+   ClearPageReserved(page);
+   set_page_count(page, page_count);
+
/*
 * Mark the block movable so that blocks are reserved for
 * movable at startup. This will force kernel allocations
@@ -741,6 +738,13 @@ static void __init_single_page(unsigned long pfn, unsigned 
long zone,
 #endif
 }
 
+void reserve_bootmem_region(unsigned long start, unsigned long end)
+{
+   for (; start  end; start++)
+   if (pfn_valid(start))
+   SetPageReserved(pfn_to_page(start));
+}
+
 static bool

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-25 Thread Yinghai Lu

On Thu, Jul 25, 2013 at 6:42 AM, Robin Holt h...@sgi.com wrote:
 On Thu, Jul 25, 2013 at 05:50:57AM -0700, Yinghai Lu wrote:
 On Wed, Jul 24, 2013 at 7:25 PM, Robin Holt h...@sgi.com wrote:
 
  How about holes that is not in memblock.reserved?
 
  before this patch:
  free_area_init_node/free_area_init_core/memmap_init_zone
  will mark all page in node range to Reserved in struct page, at first.
 
  but those holes is not mapped via kernel low mapping.
  so it should be ok not touch struct page for them.
 
  Now you only mark reserved for memblock.reserved at first, and later
  mark {memblock.memory} - { memblock.reserved} to be available.
  And that is ok.
 
  but should split that change to another patch and add some comment
  and change log for the change.
  in case there is some user like UEFI etc do weird thing.
 
  Nate and I talked this over today.  Sorry for the delay, but it was the
  first time we were both free.  Neither of us quite understands what you
  are asking for here.  My interpretation is that you would like us to
  change the use of the PageReserved flag such that during boot, we do not
  set the flag at all from memmap_init_zone, and then only set it on pages
  which, at the time of free_all_bootmem, have been allocated/reserved in
  the memblock allocator.  Is that correct?  I will start to work that up
  on the assumption that is what you are asking for.

 Not exactly.

 your change should be right, but there is some subtle change about
 holes handling.

 before mem holes between memory ranges in memblock.memory, get struct page,
 and initialized with to Reserved in memmap_init_zone.
 Those holes is not in memory.reserved, with your patches, those hole's
 struct page
 will still have all 0.

 Please separate change about set page to reserved according to 
 memory.reserved
 to another patch.


 Just want to make sure this is where you want me to go.  Here is my
 currently untested patch.  Is that what you were expecting to have done?
 One thing I don't like about this patch is it seems to slow down boot in
 my simulator environment.  I think I am going to look at restructuring
 things a bit to see if I can eliminate that performance penalty.
 Otherwise, I think I am following your directions.


 From bdd2fefa59af18f283af6f066bc644ddfa5c7da8 Mon Sep 17 00:00:00 2001
 From: Robin Holt h...@sgi.com
 Date: Thu, 25 Jul 2013 04:25:15 -0500
 Subject: [RFC -v2-pre2 4/5] ZZZ Only SegPageReserved() on memblock reserved
  pages.

yes.


 ---
  include/linux/mm.h |  2 ++
  mm/nobootmem.c |  3 +++
  mm/page_alloc.c| 18 +++---
  3 files changed, 16 insertions(+), 7 deletions(-)

 diff --git a/include/linux/mm.h b/include/linux/mm.h
 index e0c8528..b264a26 100644
 --- a/include/linux/mm.h
 +++ b/include/linux/mm.h
 @@ -1322,6 +1322,8 @@ static inline void adjust_managed_page_count(struct 
 page *page, long count)
 totalram_pages += count;
  }

 +extern void reserve_bootmem_region(unsigned long start, unsigned long end);
 +
  /* Free the reserved page into the buddy system, so it gets managed. */
  static inline void __free_reserved_page(struct page *page)
  {
 diff --git a/mm/nobootmem.c b/mm/nobootmem.c
 index 2159e68..0840af2 100644
 --- a/mm/nobootmem.c
 +++ b/mm/nobootmem.c
 @@ -117,6 +117,9 @@ static unsigned long __init 
 free_low_memory_core_early(void)
 phys_addr_t start, end, size;
 u64 i;

 +   for_each_reserved_mem_region(i, start, end)
 +   reserve_bootmem_region(start, end);
 +
 for_each_free_mem_range(i, MAX_NUMNODES, start, end, NULL)
 count += __free_memory_core(start, end);

 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 048e166..3aa30b7 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -698,7 +698,7 @@ static void free_one_page(struct zone *zone, struct page 
 *page, int order,
  }

  static void __init_single_page(unsigned long pfn, unsigned long zone,
 -  int nid, int reserved)
 +  int nid, int page_count)
  {
 struct page *page = pfn_to_page(pfn);
 struct zone *z = NODE_DATA(nid)-node_zones[zone];
 @@ -708,12 +708,9 @@ static void __init_single_page(unsigned long pfn, 
 unsigned long zone,
 init_page_count(page);
 page_mapcount_reset(page);
 page_nid_reset_last(page);
 -   if (reserved) {
 -   SetPageReserved(page);
 -   } else {
 -   ClearPageReserved(page);
 -   set_page_count(page, 0);
 -   }
 +   ClearPageReserved(page);
 +   set_page_count(page, page_count);
 +
 /*
  * Mark the block movable so that blocks are reserved for
  * movable at startup. This will force kernel allocations
 @@ -741,6 +738,13 @@ static void __init_single_page(unsigned long pfn, 
 unsigned long zone,
  #endif
  }

 +void reserve_bootmem_region(unsigned long start, unsigned long end)
 +{
 +   for (; start  end; start++)
 +

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-24 Thread Robin Holt

On Fri, Jul 12, 2013 at 09:19:12PM -0700, Yinghai Lu wrote:
> On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt  wrote:
> > During boot of large memory machines, a significant portion of boot
> > is spent initializing the struct page array.  The vast majority of
> > those pages are not referenced during boot.
> >
> > Change this over to only initializing the pages when they are
> > actually allocated.
> >
> > Besides the advantage of boot speed, this allows us the chance to
> > use normal performance monitoring tools to determine where the bulk
> > of time is spent during page initialization.
> >
> > Signed-off-by: Robin Holt 
> > Signed-off-by: Nate Zimmer 
> > To: "H. Peter Anvin" 
> > To: Ingo Molnar 
> > Cc: Linux Kernel 
> > Cc: Linux MM 
> > Cc: Rob Landley 
> > Cc: Mike Travis 
> > Cc: Daniel J Blueman 
> > Cc: Andrew Morton 
> > Cc: Greg KH 
> > Cc: Yinghai Lu 
> > Cc: Mel Gorman 
> > ---
> >  include/linux/mm.h |  11 +
> >  include/linux/page-flags.h |   5 +-
> >  mm/nobootmem.c |   5 ++
> >  mm/page_alloc.c| 117 
> > +++--
> >  4 files changed, 132 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index e0c8528..3de08b5 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page 
> > *page)
> > __free_page(page);
> >  }
> >
> > +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
> > +
> > +static inline void __reserve_bootmem_page(struct page *page)
> > +{
> > +   phys_addr_t start = page_to_pfn(page) << PAGE_SHIFT;
> > +   phys_addr_t end = start + PAGE_SIZE;
> > +
> > +   __reserve_bootmem_region(start, end);
> > +}
> > +
> >  static inline void free_reserved_page(struct page *page)
> >  {
> > +   __reserve_bootmem_page(page);
> > __free_reserved_page(page);
> > adjust_managed_page_count(page, 1);
> >  }
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 6d53675..79e8eb7 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -83,6 +83,7 @@ enum pageflags {
> > PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/
> > PG_arch_1,
> > PG_reserved,
> > +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh 
> > */
> > PG_private, /* If pagecache, has fs-private data */
> > PG_private_2,   /* If pagecache, has fs aux data */
> > PG_writeback,   /* Page is under writeback */
> > @@ -211,6 +212,8 @@ PAGEFLAG(SwapBacked, swapbacked) 
> > __CLEARPAGEFLAG(SwapBacked, swapbacked)
> >
> >  __PAGEFLAG(SlobFree, slob_free)
> >
> > +PAGEFLAG(Uninitialized2Mib, uninitialized2mib)
> > +
> >  /*
> >   * Private page markings that may be used by the filesystem that owns the 
> > page
> >   * for its own purposes.
> > @@ -499,7 +502,7 @@ static inline void ClearPageSlabPfmemalloc(struct page 
> > *page)
> >  #define PAGE_FLAGS_CHECK_AT_FREE \
> > (1 << PG_lru | 1 << PG_locked| \
> >  1 << PG_private | 1 << PG_private_2 | \
> > -1 << PG_writeback | 1 << PG_reserved | \
> > +1 << PG_writeback | 1 << PG_reserved | 1 << PG_uninitialized2mib | 
> > \
> >  1 << PG_slab| 1 << PG_swapcache | 1 << PG_active | \
> >  1 << PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \
> >  __PG_COMPOUND_LOCK)
> > diff --git a/mm/nobootmem.c b/mm/nobootmem.c
> > index 3b512ca..e3a386d 100644
> > --- a/mm/nobootmem.c
> > +++ b/mm/nobootmem.c
> > @@ -126,6 +126,9 @@ static unsigned long __init 
> > free_low_memory_core_early(void)
> > phys_addr_t start, end, size;
> > u64 i;
> >
> > +   for_each_reserved_mem_region(i, , )
> > +   __reserve_bootmem_region(start, end);
> > +
> 
> How about holes that is not in memblock.reserved?
> 
> before this patch:
> free_area_init_node/free_area_init_core/memmap_init_zone
> will mark all page in node range to Reserved in struct page, at first.
> 
> but those holes is not mapped via kernel low mapping.
> so it should be ok not touch "struct page" for them.
> 
> Now you only mark reserved for memblock.reserved at first, and later
> mark {memblock.memory} - { memblock.reserved} to be available.
> And that is ok.
> 
> but should split that change to another patch and add some comment
> and change log for the change.
> in case there is some user like UEFI etc do weird thing.

Nate and I talked this over today.  Sorry for the delay, but it was the
first time we were both free.  Neither of us quite understands what you
are asking for here.  My interpretation is that you would like us to
change the use of the PageReserved flag such that during boot, we do not
set the flag at all from memmap_init_zone, and then only set it on pages
which, at the time of free_all_bootmem, have been

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-24 Thread Robin Holt

On Fri, Jul 12, 2013 at 09:19:12PM -0700, Yinghai Lu wrote:
 On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt h...@sgi.com wrote:
  During boot of large memory machines, a significant portion of boot
  is spent initializing the struct page array.  The vast majority of
  those pages are not referenced during boot.
 
  Change this over to only initializing the pages when they are
  actually allocated.
 
  Besides the advantage of boot speed, this allows us the chance to
  use normal performance monitoring tools to determine where the bulk
  of time is spent during page initialization.
 
  Signed-off-by: Robin Holt h...@sgi.com
  Signed-off-by: Nate Zimmer nzim...@sgi.com
  To: H. Peter Anvin h...@zytor.com
  To: Ingo Molnar mi...@kernel.org
  Cc: Linux Kernel linux-kernel@vger.kernel.org
  Cc: Linux MM linux...@kvack.org
  Cc: Rob Landley r...@landley.net
  Cc: Mike Travis tra...@sgi.com
  Cc: Daniel J Blueman dan...@numascale-asia.com
  Cc: Andrew Morton a...@linux-foundation.org
  Cc: Greg KH gre...@linuxfoundation.org
  Cc: Yinghai Lu ying...@kernel.org
  Cc: Mel Gorman mgor...@suse.de
  ---
   include/linux/mm.h |  11 +
   include/linux/page-flags.h |   5 +-
   mm/nobootmem.c |   5 ++
   mm/page_alloc.c| 117 
  +++--
   4 files changed, 132 insertions(+), 6 deletions(-)
 
  diff --git a/include/linux/mm.h b/include/linux/mm.h
  index e0c8528..3de08b5 100644
  --- a/include/linux/mm.h
  +++ b/include/linux/mm.h
  @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page 
  *page)
  __free_page(page);
   }
 
  +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
  +
  +static inline void __reserve_bootmem_page(struct page *page)
  +{
  +   phys_addr_t start = page_to_pfn(page)  PAGE_SHIFT;
  +   phys_addr_t end = start + PAGE_SIZE;
  +
  +   __reserve_bootmem_region(start, end);
  +}
  +
   static inline void free_reserved_page(struct page *page)
   {
  +   __reserve_bootmem_page(page);
  __free_reserved_page(page);
  adjust_managed_page_count(page, 1);
   }
  diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
  index 6d53675..79e8eb7 100644
  --- a/include/linux/page-flags.h
  +++ b/include/linux/page-flags.h
  @@ -83,6 +83,7 @@ enum pageflags {
  PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/
  PG_arch_1,
  PG_reserved,
  +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh 
  */
  PG_private, /* If pagecache, has fs-private data */
  PG_private_2,   /* If pagecache, has fs aux data */
  PG_writeback,   /* Page is under writeback */
  @@ -211,6 +212,8 @@ PAGEFLAG(SwapBacked, swapbacked) 
  __CLEARPAGEFLAG(SwapBacked, swapbacked)
 
   __PAGEFLAG(SlobFree, slob_free)
 
  +PAGEFLAG(Uninitialized2Mib, uninitialized2mib)
  +
   /*
* Private page markings that may be used by the filesystem that owns the 
  page
* for its own purposes.
  @@ -499,7 +502,7 @@ static inline void ClearPageSlabPfmemalloc(struct page 
  *page)
   #define PAGE_FLAGS_CHECK_AT_FREE \
  (1  PG_lru | 1  PG_locked| \
   1  PG_private | 1  PG_private_2 | \
  -1  PG_writeback | 1  PG_reserved | \
  +1  PG_writeback | 1  PG_reserved | 1  PG_uninitialized2mib | 
  \
   1  PG_slab| 1  PG_swapcache | 1  PG_active | \
   1  PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \
   __PG_COMPOUND_LOCK)
  diff --git a/mm/nobootmem.c b/mm/nobootmem.c
  index 3b512ca..e3a386d 100644
  --- a/mm/nobootmem.c
  +++ b/mm/nobootmem.c
  @@ -126,6 +126,9 @@ static unsigned long __init 
  free_low_memory_core_early(void)
  phys_addr_t start, end, size;
  u64 i;
 
  +   for_each_reserved_mem_region(i, start, end)
  +   __reserve_bootmem_region(start, end);
  +
 
 How about holes that is not in memblock.reserved?
 
 before this patch:
 free_area_init_node/free_area_init_core/memmap_init_zone
 will mark all page in node range to Reserved in struct page, at first.
 
 but those holes is not mapped via kernel low mapping.
 so it should be ok not touch struct page for them.
 
 Now you only mark reserved for memblock.reserved at first, and later
 mark {memblock.memory} - { memblock.reserved} to be available.
 And that is ok.
 
 but should split that change to another patch and add some comment
 and change log for the change.
 in case there is some user like UEFI etc do weird thing.

Nate and I talked this over today.  Sorry for the delay, but it was the
first time we were both free.  Neither of us quite understands what you
are asking for here.  My interpretation is that you would like us to
change the use of the PageReserved flag such that during boot, we do not
set the flag at all from memmap_init_zone, and then only set it on pages
which, at the time of free_all_bootmem, have been

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-23 Thread Robin Holt

On Tue, Jul 23, 2013 at 06:41:50AM -0500, Robin Holt wrote:
> On Tue, Jul 23, 2013 at 06:15:49AM -0500, Robin Holt wrote:
> > I think the other critical path which is affected is in expand().
> > There, we just call ensure_page_is_initialized() blindly which does
> > the check against the other page.  The below is a nearly zero addition.
> > Sorry for the confusion.  My morning coffee has not kicked in yet.
> 
> I don't have access to the 16TiB system until Thursday unless the other
> testing on it fails early.  I did boot a 2TiB system with the a change
> which set the Unitialized_2m flag on all pages in that 2MiB range
> during memmap_init_zone.  That makes the expand check test against
> the referenced page instead of having to go back to the 2MiB page.
> It appears to have added less than a second to the 2TiB boot so I hope
> it has equally little impact to the 16TiB boot.

I was wrong.  One of the two logs I looked at was the wrong one.
Setting that Unitialized2m flag on all pages added 30 seconds to the
2TiB boot's memmap_init_zone().  Please disregard.

That brings me back to the belief we need a better solution for the
expand() path.

Robin

> 
> I will clean up this patch some more and resend the currently untested
> set later today.
> 
> Thanks,
> Robin
> 
> > 
> > Robin
> > 
> > On Tue, Jul 23, 2013 at 06:09:47AM -0500, Robin Holt wrote:
> > > On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote:
> > > > 
> > > > * H. Peter Anvin  wrote:
> > > > 
> > > > > On 07/15/2013 11:26 AM, Robin Holt wrote:
> > > > >
> > > > > > Is there a fairly cheap way to determine definitively that the 
> > > > > > struct 
> > > > > > page is not initialized?
> > > > > 
> > > > > By definition I would assume no.  The only way I can think of would 
> > > > > be 
> > > > > to unmap the memory associated with the struct page in the TLB and 
> > > > > initialize the struct pages at trap time.
> > > > 
> > > > But ... the only fastpath impact I can see of delayed initialization 
> > > > right 
> > > > now is this piece of logic in prep_new_page():
> > > > 
> > > > @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int 
> > > > order, gfp_t gfp_flags)
> > > > 
> > > > for (i = 0; i < (1 << order); i++) {
> > > > struct page *p = page + i;
> > > > +
> > > > +   if (PageUninitialized2Mib(p))
> > > > +   expand_page_initialization(page);
> > > > +
> > > > if (unlikely(check_new_page(p)))
> > > > return 1;
> > > > 
> > > > That is where I think it can be made zero overhead in the 
> > > > already-initialized case, because page-flags are already used in 
> > > > check_new_page():
> > > 
> > > The problem I see here is that the page flags we need to check for the
> > > uninitialized flag are in the "other" page for the page aligned at the
> > > 2MiB virtual address, not the page currently being referenced.
> > > 
> > > Let me try a version of the patch where we set the PG_unintialized_2m
> > > flag on all pages, including the aligned pages and see what that does
> > > to performance.
> > > 
> > > Robin
> > > 
> > > > 
> > > > static inline int check_new_page(struct page *page)
> > > > {
> > > > if (unlikely(page_mapcount(page) |
> > > > (page->mapping != NULL)  |
> > > > (atomic_read(>_count) != 0)  |
> > > > (page->flags & PAGE_FLAGS_CHECK_AT_PREP) |
> > > > (mem_cgroup_bad_page_check(page {
> > > > bad_page(page);
> > > > return 1;
> > > > 
> > > > see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for 
> > > > every 
> > > > struct page on allocation.
> > > > 
> > > > We can micro-optimize that low overhead to zero-overhead, by 
> > > > integrating 
> > > > the PageUninitialized2Mib() check into check_new_page(). This can be 
> > > > done 
> > > > by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing:
> > > > 
> > > > 
> > > > if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) {
> > > > if (PageUninitialized2Mib(p))
> > > > expand_page_initialization(page);
> > > > ...
> > > > }
> > > > 
> > > > if (unlikely(page_mapcount(page) |
> > > > (page->mapping != NULL)  |
> > > > (atomic_read(>_count) != 0)  |
> > > > (mem_cgroup_bad_page_check(page {
> > > > bad_page(page);
> > > > 
> > > > return 1;
> > > > 
> > > > this will result in making it essentially zero-overhead, the 
> > > > expand_page_initialization() logic is now in a slowpath.
> > > > 
> > > > Am I missing anything here?
> > > > 
> > > > Thanks,
> > > > 
> > > > Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-23 Thread Robin Holt

On Tue, Jul 23, 2013 at 06:15:49AM -0500, Robin Holt wrote:
> I think the other critical path which is affected is in expand().
> There, we just call ensure_page_is_initialized() blindly which does
> the check against the other page.  The below is a nearly zero addition.
> Sorry for the confusion.  My morning coffee has not kicked in yet.

I don't have access to the 16TiB system until Thursday unless the other
testing on it fails early.  I did boot a 2TiB system with the a change
which set the Unitialized_2m flag on all pages in that 2MiB range
during memmap_init_zone.  That makes the expand check test against
the referenced page instead of having to go back to the 2MiB page.
It appears to have added less than a second to the 2TiB boot so I hope
it has equally little impact to the 16TiB boot.

I will clean up this patch some more and resend the currently untested
set later today.

Thanks,
Robin

> 
> Robin
> 
> On Tue, Jul 23, 2013 at 06:09:47AM -0500, Robin Holt wrote:
> > On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote:
> > > 
> > > * H. Peter Anvin  wrote:
> > > 
> > > > On 07/15/2013 11:26 AM, Robin Holt wrote:
> > > >
> > > > > Is there a fairly cheap way to determine definitively that the struct 
> > > > > page is not initialized?
> > > > 
> > > > By definition I would assume no.  The only way I can think of would be 
> > > > to unmap the memory associated with the struct page in the TLB and 
> > > > initialize the struct pages at trap time.
> > > 
> > > But ... the only fastpath impact I can see of delayed initialization 
> > > right 
> > > now is this piece of logic in prep_new_page():
> > > 
> > > @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int 
> > > order, gfp_t gfp_flags)
> > > 
> > > for (i = 0; i < (1 << order); i++) {
> > > struct page *p = page + i;
> > > +
> > > +   if (PageUninitialized2Mib(p))
> > > +   expand_page_initialization(page);
> > > +
> > > if (unlikely(check_new_page(p)))
> > > return 1;
> > > 
> > > That is where I think it can be made zero overhead in the 
> > > already-initialized case, because page-flags are already used in 
> > > check_new_page():
> > 
> > The problem I see here is that the page flags we need to check for the
> > uninitialized flag are in the "other" page for the page aligned at the
> > 2MiB virtual address, not the page currently being referenced.
> > 
> > Let me try a version of the patch where we set the PG_unintialized_2m
> > flag on all pages, including the aligned pages and see what that does
> > to performance.
> > 
> > Robin
> > 
> > > 
> > > static inline int check_new_page(struct page *page)
> > > {
> > > if (unlikely(page_mapcount(page) |
> > > (page->mapping != NULL)  |
> > > (atomic_read(>_count) != 0)  |
> > > (page->flags & PAGE_FLAGS_CHECK_AT_PREP) |
> > > (mem_cgroup_bad_page_check(page {
> > > bad_page(page);
> > > return 1;
> > > 
> > > see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for 
> > > every 
> > > struct page on allocation.
> > > 
> > > We can micro-optimize that low overhead to zero-overhead, by integrating 
> > > the PageUninitialized2Mib() check into check_new_page(). This can be done 
> > > by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing:
> > > 
> > > 
> > >   if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) {
> > >   if (PageUninitialized2Mib(p))
> > >   expand_page_initialization(page);
> > >   ...
> > >   }
> > > 
> > > if (unlikely(page_mapcount(page) |
> > > (page->mapping != NULL)  |
> > > (atomic_read(>_count) != 0)  |
> > > (mem_cgroup_bad_page_check(page {
> > > bad_page(page);
> > > 
> > > return 1;
> > > 
> > > this will result in making it essentially zero-overhead, the 
> > > expand_page_initialization() logic is now in a slowpath.
> > > 
> > > Am I missing anything here?
> > > 
> > > Thanks,
> > > 
> > >   Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-23 Thread Robin Holt

I think the other critical path which is affected is in expand().
There, we just call ensure_page_is_initialized() blindly which does
the check against the other page.  The below is a nearly zero addition.
Sorry for the confusion.  My morning coffee has not kicked in yet.

Robin

On Tue, Jul 23, 2013 at 06:09:47AM -0500, Robin Holt wrote:
> On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote:
> > 
> > * H. Peter Anvin  wrote:
> > 
> > > On 07/15/2013 11:26 AM, Robin Holt wrote:
> > >
> > > > Is there a fairly cheap way to determine definitively that the struct 
> > > > page is not initialized?
> > > 
> > > By definition I would assume no.  The only way I can think of would be 
> > > to unmap the memory associated with the struct page in the TLB and 
> > > initialize the struct pages at trap time.
> > 
> > But ... the only fastpath impact I can see of delayed initialization right 
> > now is this piece of logic in prep_new_page():
> > 
> > @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, 
> > gfp_t gfp_flags)
> > 
> > for (i = 0; i < (1 << order); i++) {
> > struct page *p = page + i;
> > +
> > +   if (PageUninitialized2Mib(p))
> > +   expand_page_initialization(page);
> > +
> > if (unlikely(check_new_page(p)))
> > return 1;
> > 
> > That is where I think it can be made zero overhead in the 
> > already-initialized case, because page-flags are already used in 
> > check_new_page():
> 
> The problem I see here is that the page flags we need to check for the
> uninitialized flag are in the "other" page for the page aligned at the
> 2MiB virtual address, not the page currently being referenced.
> 
> Let me try a version of the patch where we set the PG_unintialized_2m
> flag on all pages, including the aligned pages and see what that does
> to performance.
> 
> Robin
> 
> > 
> > static inline int check_new_page(struct page *page)
> > {
> > if (unlikely(page_mapcount(page) |
> > (page->mapping != NULL)  |
> > (atomic_read(>_count) != 0)  |
> > (page->flags & PAGE_FLAGS_CHECK_AT_PREP) |
> > (mem_cgroup_bad_page_check(page {
> > bad_page(page);
> > return 1;
> > 
> > see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every 
> > struct page on allocation.
> > 
> > We can micro-optimize that low overhead to zero-overhead, by integrating 
> > the PageUninitialized2Mib() check into check_new_page(). This can be done 
> > by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing:
> > 
> > 
> > if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) {
> > if (PageUninitialized2Mib(p))
> > expand_page_initialization(page);
> > ...
> > }
> > 
> > if (unlikely(page_mapcount(page) |
> > (page->mapping != NULL)  |
> > (atomic_read(>_count) != 0)  |
> > (mem_cgroup_bad_page_check(page {
> > bad_page(page);
> > 
> > return 1;
> > 
> > this will result in making it essentially zero-overhead, the 
> > expand_page_initialization() logic is now in a slowpath.
> > 
> > Am I missing anything here?
> > 
> > Thanks,
> > 
> > Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-23 Thread Robin Holt

On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote:
> 
> * H. Peter Anvin  wrote:
> 
> > On 07/15/2013 11:26 AM, Robin Holt wrote:
> >
> > > Is there a fairly cheap way to determine definitively that the struct 
> > > page is not initialized?
> > 
> > By definition I would assume no.  The only way I can think of would be 
> > to unmap the memory associated with the struct page in the TLB and 
> > initialize the struct pages at trap time.
> 
> But ... the only fastpath impact I can see of delayed initialization right 
> now is this piece of logic in prep_new_page():
> 
> @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, 
> gfp_t gfp_flags)
> 
> for (i = 0; i < (1 << order); i++) {
> struct page *p = page + i;
> +
> +   if (PageUninitialized2Mib(p))
> +   expand_page_initialization(page);
> +
> if (unlikely(check_new_page(p)))
> return 1;
> 
> That is where I think it can be made zero overhead in the 
> already-initialized case, because page-flags are already used in 
> check_new_page():

The problem I see here is that the page flags we need to check for the
uninitialized flag are in the "other" page for the page aligned at the
2MiB virtual address, not the page currently being referenced.

Let me try a version of the patch where we set the PG_unintialized_2m
flag on all pages, including the aligned pages and see what that does
to performance.

Robin

> 
> static inline int check_new_page(struct page *page)
> {
> if (unlikely(page_mapcount(page) |
> (page->mapping != NULL)  |
> (atomic_read(>_count) != 0)  |
> (page->flags & PAGE_FLAGS_CHECK_AT_PREP) |
> (mem_cgroup_bad_page_check(page {
> bad_page(page);
> return 1;
> 
> see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every 
> struct page on allocation.
> 
> We can micro-optimize that low overhead to zero-overhead, by integrating 
> the PageUninitialized2Mib() check into check_new_page(). This can be done 
> by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing:
> 
> 
>   if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) {
>   if (PageUninitialized2Mib(p))
>   expand_page_initialization(page);
>   ...
>   }
> 
> if (unlikely(page_mapcount(page) |
> (page->mapping != NULL)  |
> (atomic_read(>_count) != 0)  |
> (mem_cgroup_bad_page_check(page {
> bad_page(page);
> 
> return 1;
> 
> this will result in making it essentially zero-overhead, the 
> expand_page_initialization() logic is now in a slowpath.
> 
> Am I missing anything here?
> 
> Thanks,
> 
>   Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-23 Thread Ingo Molnar


* H. Peter Anvin  wrote:

> On 07/15/2013 11:26 AM, Robin Holt wrote:
>
> > Is there a fairly cheap way to determine definitively that the struct 
> > page is not initialized?
> 
> By definition I would assume no.  The only way I can think of would be 
> to unmap the memory associated with the struct page in the TLB and 
> initialize the struct pages at trap time.

But ... the only fastpath impact I can see of delayed initialization right 
now is this piece of logic in prep_new_page():

@@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, 
gfp_t gfp_flags)

for (i = 0; i < (1 << order); i++) {
struct page *p = page + i;
+
+   if (PageUninitialized2Mib(p))
+   expand_page_initialization(page);
+
if (unlikely(check_new_page(p)))
return 1;

That is where I think it can be made zero overhead in the 
already-initialized case, because page-flags are already used in 
check_new_page():

static inline int check_new_page(struct page *page)
{
if (unlikely(page_mapcount(page) |
(page->mapping != NULL)  |
(atomic_read(>_count) != 0)  |
(page->flags & PAGE_FLAGS_CHECK_AT_PREP) |
(mem_cgroup_bad_page_check(page {
bad_page(page);
return 1;

see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every 
struct page on allocation.

We can micro-optimize that low overhead to zero-overhead, by integrating 
the PageUninitialized2Mib() check into check_new_page(). This can be done 
by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing:


if (unlikely(page->flags & PAGE_FLAGS_CHECK_AT_PREP)) {
if (PageUninitialized2Mib(p))
expand_page_initialization(page);
...
}

if (unlikely(page_mapcount(page) |
(page->mapping != NULL)  |
(atomic_read(>_count) != 0)  |
(mem_cgroup_bad_page_check(page {
bad_page(page);

return 1;

this will result in making it essentially zero-overhead, the 
expand_page_initialization() logic is now in a slowpath.

Am I missing anything here?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-23 Thread Ingo Molnar


* H. Peter Anvin h...@zytor.com wrote:

 On 07/15/2013 11:26 AM, Robin Holt wrote:

  Is there a fairly cheap way to determine definitively that the struct 
  page is not initialized?
 
 By definition I would assume no.  The only way I can think of would be 
 to unmap the memory associated with the struct page in the TLB and 
 initialize the struct pages at trap time.

But ... the only fastpath impact I can see of delayed initialization right 
now is this piece of logic in prep_new_page():

@@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, 
gfp_t gfp_flags)

for (i = 0; i  (1  order); i++) {
struct page *p = page + i;
+
+   if (PageUninitialized2Mib(p))
+   expand_page_initialization(page);
+
if (unlikely(check_new_page(p)))
return 1;

That is where I think it can be made zero overhead in the 
already-initialized case, because page-flags are already used in 
check_new_page():

static inline int check_new_page(struct page *page)
{
if (unlikely(page_mapcount(page) |
(page-mapping != NULL)  |
(atomic_read(page-_count) != 0)  |
(page-flags  PAGE_FLAGS_CHECK_AT_PREP) |
(mem_cgroup_bad_page_check(page {
bad_page(page);
return 1;

see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every 
struct page on allocation.

We can micro-optimize that low overhead to zero-overhead, by integrating 
the PageUninitialized2Mib() check into check_new_page(). This can be done 
by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing:


if (unlikely(page-flags  PAGE_FLAGS_CHECK_AT_PREP)) {
if (PageUninitialized2Mib(p))
expand_page_initialization(page);
...
}

if (unlikely(page_mapcount(page) |
(page-mapping != NULL)  |
(atomic_read(page-_count) != 0)  |
(mem_cgroup_bad_page_check(page {
bad_page(page);

return 1;

this will result in making it essentially zero-overhead, the 
expand_page_initialization() logic is now in a slowpath.

Am I missing anything here?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-23 Thread Robin Holt

On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote:
 
 * H. Peter Anvin h...@zytor.com wrote:
 
  On 07/15/2013 11:26 AM, Robin Holt wrote:
 
   Is there a fairly cheap way to determine definitively that the struct 
   page is not initialized?
  
  By definition I would assume no.  The only way I can think of would be 
  to unmap the memory associated with the struct page in the TLB and 
  initialize the struct pages at trap time.
 
 But ... the only fastpath impact I can see of delayed initialization right 
 now is this piece of logic in prep_new_page():
 
 @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, 
 gfp_t gfp_flags)
 
 for (i = 0; i  (1  order); i++) {
 struct page *p = page + i;
 +
 +   if (PageUninitialized2Mib(p))
 +   expand_page_initialization(page);
 +
 if (unlikely(check_new_page(p)))
 return 1;
 
 That is where I think it can be made zero overhead in the 
 already-initialized case, because page-flags are already used in 
 check_new_page():

The problem I see here is that the page flags we need to check for the
uninitialized flag are in the other page for the page aligned at the
2MiB virtual address, not the page currently being referenced.

Let me try a version of the patch where we set the PG_unintialized_2m
flag on all pages, including the aligned pages and see what that does
to performance.

Robin

 
 static inline int check_new_page(struct page *page)
 {
 if (unlikely(page_mapcount(page) |
 (page-mapping != NULL)  |
 (atomic_read(page-_count) != 0)  |
 (page-flags  PAGE_FLAGS_CHECK_AT_PREP) |
 (mem_cgroup_bad_page_check(page {
 bad_page(page);
 return 1;
 
 see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every 
 struct page on allocation.
 
 We can micro-optimize that low overhead to zero-overhead, by integrating 
 the PageUninitialized2Mib() check into check_new_page(). This can be done 
 by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing:
 
 
   if (unlikely(page-flags  PAGE_FLAGS_CHECK_AT_PREP)) {
   if (PageUninitialized2Mib(p))
   expand_page_initialization(page);
   ...
   }
 
 if (unlikely(page_mapcount(page) |
 (page-mapping != NULL)  |
 (atomic_read(page-_count) != 0)  |
 (mem_cgroup_bad_page_check(page {
 bad_page(page);
 
 return 1;
 
 this will result in making it essentially zero-overhead, the 
 expand_page_initialization() logic is now in a slowpath.
 
 Am I missing anything here?
 
 Thanks,
 
   Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-23 Thread Robin Holt

I think the other critical path which is affected is in expand().
There, we just call ensure_page_is_initialized() blindly which does
the check against the other page.  The below is a nearly zero addition.
Sorry for the confusion.  My morning coffee has not kicked in yet.

Robin

On Tue, Jul 23, 2013 at 06:09:47AM -0500, Robin Holt wrote:
 On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote:
  
  * H. Peter Anvin h...@zytor.com wrote:
  
   On 07/15/2013 11:26 AM, Robin Holt wrote:
  
Is there a fairly cheap way to determine definitively that the struct 
page is not initialized?
   
   By definition I would assume no.  The only way I can think of would be 
   to unmap the memory associated with the struct page in the TLB and 
   initialize the struct pages at trap time.
  
  But ... the only fastpath impact I can see of delayed initialization right 
  now is this piece of logic in prep_new_page():
  
  @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int order, 
  gfp_t gfp_flags)
  
  for (i = 0; i  (1  order); i++) {
  struct page *p = page + i;
  +
  +   if (PageUninitialized2Mib(p))
  +   expand_page_initialization(page);
  +
  if (unlikely(check_new_page(p)))
  return 1;
  
  That is where I think it can be made zero overhead in the 
  already-initialized case, because page-flags are already used in 
  check_new_page():
 
 The problem I see here is that the page flags we need to check for the
 uninitialized flag are in the other page for the page aligned at the
 2MiB virtual address, not the page currently being referenced.
 
 Let me try a version of the patch where we set the PG_unintialized_2m
 flag on all pages, including the aligned pages and see what that does
 to performance.
 
 Robin
 
  
  static inline int check_new_page(struct page *page)
  {
  if (unlikely(page_mapcount(page) |
  (page-mapping != NULL)  |
  (atomic_read(page-_count) != 0)  |
  (page-flags  PAGE_FLAGS_CHECK_AT_PREP) |
  (mem_cgroup_bad_page_check(page {
  bad_page(page);
  return 1;
  
  see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for every 
  struct page on allocation.
  
  We can micro-optimize that low overhead to zero-overhead, by integrating 
  the PageUninitialized2Mib() check into check_new_page(). This can be done 
  by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing:
  
  
  if (unlikely(page-flags  PAGE_FLAGS_CHECK_AT_PREP)) {
  if (PageUninitialized2Mib(p))
  expand_page_initialization(page);
  ...
  }
  
  if (unlikely(page_mapcount(page) |
  (page-mapping != NULL)  |
  (atomic_read(page-_count) != 0)  |
  (mem_cgroup_bad_page_check(page {
  bad_page(page);
  
  return 1;
  
  this will result in making it essentially zero-overhead, the 
  expand_page_initialization() logic is now in a slowpath.
  
  Am I missing anything here?
  
  Thanks,
  
  Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-23 Thread Robin Holt

On Tue, Jul 23, 2013 at 06:15:49AM -0500, Robin Holt wrote:
 I think the other critical path which is affected is in expand().
 There, we just call ensure_page_is_initialized() blindly which does
 the check against the other page.  The below is a nearly zero addition.
 Sorry for the confusion.  My morning coffee has not kicked in yet.

I don't have access to the 16TiB system until Thursday unless the other
testing on it fails early.  I did boot a 2TiB system with the a change
which set the Unitialized_2m flag on all pages in that 2MiB range
during memmap_init_zone.  That makes the expand check test against
the referenced page instead of having to go back to the 2MiB page.
It appears to have added less than a second to the 2TiB boot so I hope
it has equally little impact to the 16TiB boot.

I will clean up this patch some more and resend the currently untested
set later today.

Thanks,
Robin

 
 Robin
 
 On Tue, Jul 23, 2013 at 06:09:47AM -0500, Robin Holt wrote:
  On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote:
   
   * H. Peter Anvin h...@zytor.com wrote:
   
On 07/15/2013 11:26 AM, Robin Holt wrote:
   
 Is there a fairly cheap way to determine definitively that the struct 
 page is not initialized?

By definition I would assume no.  The only way I can think of would be 
to unmap the memory associated with the struct page in the TLB and 
initialize the struct pages at trap time.
   
   But ... the only fastpath impact I can see of delayed initialization 
   right 
   now is this piece of logic in prep_new_page():
   
   @@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int 
   order, gfp_t gfp_flags)
   
   for (i = 0; i  (1  order); i++) {
   struct page *p = page + i;
   +
   +   if (PageUninitialized2Mib(p))
   +   expand_page_initialization(page);
   +
   if (unlikely(check_new_page(p)))
   return 1;
   
   That is where I think it can be made zero overhead in the 
   already-initialized case, because page-flags are already used in 
   check_new_page():
  
  The problem I see here is that the page flags we need to check for the
  uninitialized flag are in the other page for the page aligned at the
  2MiB virtual address, not the page currently being referenced.
  
  Let me try a version of the patch where we set the PG_unintialized_2m
  flag on all pages, including the aligned pages and see what that does
  to performance.
  
  Robin
  
   
   static inline int check_new_page(struct page *page)
   {
   if (unlikely(page_mapcount(page) |
   (page-mapping != NULL)  |
   (atomic_read(page-_count) != 0)  |
   (page-flags  PAGE_FLAGS_CHECK_AT_PREP) |
   (mem_cgroup_bad_page_check(page {
   bad_page(page);
   return 1;
   
   see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for 
   every 
   struct page on allocation.
   
   We can micro-optimize that low overhead to zero-overhead, by integrating 
   the PageUninitialized2Mib() check into check_new_page(). This can be done 
   by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing:
   
   
 if (unlikely(page-flags  PAGE_FLAGS_CHECK_AT_PREP)) {
 if (PageUninitialized2Mib(p))
 expand_page_initialization(page);
 ...
 }
   
   if (unlikely(page_mapcount(page) |
   (page-mapping != NULL)  |
   (atomic_read(page-_count) != 0)  |
   (mem_cgroup_bad_page_check(page {
   bad_page(page);
   
   return 1;
   
   this will result in making it essentially zero-overhead, the 
   expand_page_initialization() logic is now in a slowpath.
   
   Am I missing anything here?
   
   Thanks,
   
 Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-23 Thread Robin Holt

On Tue, Jul 23, 2013 at 06:41:50AM -0500, Robin Holt wrote:
 On Tue, Jul 23, 2013 at 06:15:49AM -0500, Robin Holt wrote:
  I think the other critical path which is affected is in expand().
  There, we just call ensure_page_is_initialized() blindly which does
  the check against the other page.  The below is a nearly zero addition.
  Sorry for the confusion.  My morning coffee has not kicked in yet.
 
 I don't have access to the 16TiB system until Thursday unless the other
 testing on it fails early.  I did boot a 2TiB system with the a change
 which set the Unitialized_2m flag on all pages in that 2MiB range
 during memmap_init_zone.  That makes the expand check test against
 the referenced page instead of having to go back to the 2MiB page.
 It appears to have added less than a second to the 2TiB boot so I hope
 it has equally little impact to the 16TiB boot.

I was wrong.  One of the two logs I looked at was the wrong one.
Setting that Unitialized2m flag on all pages added 30 seconds to the
2TiB boot's memmap_init_zone().  Please disregard.

That brings me back to the belief we need a better solution for the
expand() path.

Robin

 
 I will clean up this patch some more and resend the currently untested
 set later today.
 
 Thanks,
 Robin
 
  
  Robin
  
  On Tue, Jul 23, 2013 at 06:09:47AM -0500, Robin Holt wrote:
   On Tue, Jul 23, 2013 at 10:32:11AM +0200, Ingo Molnar wrote:

* H. Peter Anvin h...@zytor.com wrote:

 On 07/15/2013 11:26 AM, Robin Holt wrote:

  Is there a fairly cheap way to determine definitively that the 
  struct 
  page is not initialized?
 
 By definition I would assume no.  The only way I can think of would 
 be 
 to unmap the memory associated with the struct page in the TLB and 
 initialize the struct pages at trap time.

But ... the only fastpath impact I can see of delayed initialization 
right 
now is this piece of logic in prep_new_page():

@@ -903,6 +964,10 @@ static int prep_new_page(struct page *page, int 
order, gfp_t gfp_flags)

for (i = 0; i  (1  order); i++) {
struct page *p = page + i;
+
+   if (PageUninitialized2Mib(p))
+   expand_page_initialization(page);
+
if (unlikely(check_new_page(p)))
return 1;

That is where I think it can be made zero overhead in the 
already-initialized case, because page-flags are already used in 
check_new_page():
   
   The problem I see here is that the page flags we need to check for the
   uninitialized flag are in the other page for the page aligned at the
   2MiB virtual address, not the page currently being referenced.
   
   Let me try a version of the patch where we set the PG_unintialized_2m
   flag on all pages, including the aligned pages and see what that does
   to performance.
   
   Robin
   

static inline int check_new_page(struct page *page)
{
if (unlikely(page_mapcount(page) |
(page-mapping != NULL)  |
(atomic_read(page-_count) != 0)  |
(page-flags  PAGE_FLAGS_CHECK_AT_PREP) |
(mem_cgroup_bad_page_check(page {
bad_page(page);
return 1;

see that PAGE_FLAGS_CHECK_AT_PREP flag? That always gets checked for 
every 
struct page on allocation.

We can micro-optimize that low overhead to zero-overhead, by 
integrating 
the PageUninitialized2Mib() check into check_new_page(). This can be 
done 
by adding PG_uninitialized2mib to PAGE_FLAGS_CHECK_AT_PREP and doing:


if (unlikely(page-flags  PAGE_FLAGS_CHECK_AT_PREP)) {
if (PageUninitialized2Mib(p))
expand_page_initialization(page);
...
}

if (unlikely(page_mapcount(page) |
(page-mapping != NULL)  |
(atomic_read(page-_count) != 0)  |
(mem_cgroup_bad_page_check(page {
bad_page(page);

return 1;

this will result in making it essentially zero-overhead, the 
expand_page_initialization() logic is now in a slowpath.

Am I missing anything here?

Thanks,

Ingo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-16 Thread Robin Holt

On Mon, Jul 15, 2013 at 02:30:37PM -0700, Andrew Morton wrote:
> On Thu, 11 Jul 2013 21:03:55 -0500 Robin Holt  wrote:
> 
> > During boot of large memory machines, a significant portion of boot
> > is spent initializing the struct page array.  The vast majority of
> > those pages are not referenced during boot.
> > 
> > Change this over to only initializing the pages when they are
> > actually allocated.
> > 
> > Besides the advantage of boot speed, this allows us the chance to
> > use normal performance monitoring tools to determine where the bulk
> > of time is spent during page initialization.
> > 
> > ...
> >
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page 
> > *page)
> > __free_page(page);
> >  }
> >  
> > +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
> > +
> > +static inline void __reserve_bootmem_page(struct page *page)
> > +{
> > +   phys_addr_t start = page_to_pfn(page) << PAGE_SHIFT;
> > +   phys_addr_t end = start + PAGE_SIZE;
> > +
> > +   __reserve_bootmem_region(start, end);
> > +}
> 
> It isn't obvious that this needed to be inlined?

It is being declared in a header file.  All the other functions I came
across in that header file are declared as inline (or __always_inline).
It feels to me like this is right.  Can I leave it as-is?

> 
> >  static inline void free_reserved_page(struct page *page)
> >  {
> > +   __reserve_bootmem_page(page);
> > __free_reserved_page(page);
> > adjust_managed_page_count(page, 1);
> >  }
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 6d53675..79e8eb7 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -83,6 +83,7 @@ enum pageflags {
> > PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/
> > PG_arch_1,
> > PG_reserved,
> > +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh */
> 
> "mib" creeps me out too.  And it makes me think of SNMP, which I'd
> prefer not to think about.
> 
> We've traditionally had fears of running out of page flags, but I've
> lost track of how close we are to that happening.  IIRC the answer
> depends on whether you believe there is such a thing as a 32-bit NUMA
> system.
> 
> Can this be avoided anyway?  I suspect there's some idiotic combination
> of flags we could use to indicate the state.  PG_reserved|PG_lru or
> something.
> 
> "2MB" sounds terribly arch-specific.  Shouldn't we make it more generic
> for when the hexagon64 port wants to use 4MB?
> 
> That conversational code comment was already commented on, but it's
> still there?

I am going to work on making it non-2m based over the course of this
week, so expect the _2m (current name based on Yinghai's comments)
to go away entirely.

> > 
> > ...
> >
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -740,6 +740,54 @@ static void __init_single_page(struct page *page, 
> > unsigned long zone, int nid, i
> >  #endif
> >  }
> >  
> > +static void expand_page_initialization(struct page *basepage)
> > +{
> > +   unsigned long pfn = page_to_pfn(basepage);
> > +   unsigned long end_pfn = pfn + PTRS_PER_PMD;
> > +   unsigned long zone = page_zonenum(basepage);
> > +   int reserved = PageReserved(basepage);
> > +   int nid = page_to_nid(basepage);
> > +
> > +   ClearPageUninitialized2Mib(basepage);
> > +
> > +   for( pfn++; pfn < end_pfn; pfn++ )
> > +   __init_single_page(pfn_to_page(pfn), zone, nid, reserved);
> > +}
> > +
> > +void ensure_pages_are_initialized(unsigned long start_pfn,
> > + unsigned long end_pfn)
> 
> I think this can be made static.  I hope so, as it's a somewhat
> odd-sounding identifier for a global.

Done.

> > +{
> > +   unsigned long aligned_start_pfn = start_pfn & ~(PTRS_PER_PMD - 1);
> > +   unsigned long aligned_end_pfn;
> > +   struct page *page;
> > +
> > +   aligned_end_pfn = end_pfn & ~(PTRS_PER_PMD - 1);
> > +   aligned_end_pfn += PTRS_PER_PMD;
> > +   while (aligned_start_pfn < aligned_end_pfn) {
> > +   if (pfn_valid(aligned_start_pfn)) {
> > +   page = pfn_to_page(aligned_start_pfn);
> > +
> > +   if(PageUninitialized2Mib(page))
> 
> checkpatch them, please.

Will certainly do.

> > +   expand_page_initialization(page);
> > +   }
> > +
> > +   aligned_start_pfn += PTRS_PER_PMD;
> > +   }
> > +}
> 
> Some nice code comments for the above two functions would be helpful.

Will do.

> > 
> > ...
> >
> > +int __meminit pfn_range_init_avail(unsigned long pfn, unsigned long 
> > end_pfn,
> > +  unsigned long size, int nid)
> > +{
> > +   unsigned long validate_end_pfn = pfn + size;
> > +
> > +   if (pfn & (size - 1))
> > +   return 1;
> > +
> > +   if (pfn + size >= end_pfn)
> > +   return 1;
> > +
> > +   while (pfn < validate_end_pfn)
> > +   {
>

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-16 Thread Robin Holt

On Fri, Jul 12, 2013 at 09:19:12PM -0700, Yinghai Lu wrote:
> On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt  wrote:
> > During boot of large memory machines, a significant portion of boot
> > is spent initializing the struct page array.  The vast majority of
> > those pages are not referenced during boot.
> >
> > Change this over to only initializing the pages when they are
> > actually allocated.
> >
> > Besides the advantage of boot speed, this allows us the chance to
> > use normal performance monitoring tools to determine where the bulk
> > of time is spent during page initialization.
> >
> > Signed-off-by: Robin Holt 
> > Signed-off-by: Nate Zimmer 
> > To: "H. Peter Anvin" 
> > To: Ingo Molnar 
> > Cc: Linux Kernel 
> > Cc: Linux MM 
> > Cc: Rob Landley 
> > Cc: Mike Travis 
> > Cc: Daniel J Blueman 
> > Cc: Andrew Morton 
> > Cc: Greg KH 
> > Cc: Yinghai Lu 
> > Cc: Mel Gorman 
> > ---
> >  include/linux/mm.h |  11 +
> >  include/linux/page-flags.h |   5 +-
> >  mm/nobootmem.c |   5 ++
> >  mm/page_alloc.c| 117 
> > +++--
> >  4 files changed, 132 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index e0c8528..3de08b5 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page 
> > *page)
> > __free_page(page);
> >  }
> >
> > +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
> > +
> > +static inline void __reserve_bootmem_page(struct page *page)
> > +{
> > +   phys_addr_t start = page_to_pfn(page) << PAGE_SHIFT;
> > +   phys_addr_t end = start + PAGE_SIZE;
> > +
> > +   __reserve_bootmem_region(start, end);
> > +}
> > +
> >  static inline void free_reserved_page(struct page *page)
> >  {
> > +   __reserve_bootmem_page(page);
> > __free_reserved_page(page);
> > adjust_managed_page_count(page, 1);
> >  }
> > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> > index 6d53675..79e8eb7 100644
> > --- a/include/linux/page-flags.h
> > +++ b/include/linux/page-flags.h
> > @@ -83,6 +83,7 @@ enum pageflags {
> > PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/
> > PG_arch_1,
> > PG_reserved,
> > +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh 
> > */
> > PG_private, /* If pagecache, has fs-private data */
> > PG_private_2,   /* If pagecache, has fs aux data */
> > PG_writeback,   /* Page is under writeback */
> > @@ -211,6 +212,8 @@ PAGEFLAG(SwapBacked, swapbacked) 
> > __CLEARPAGEFLAG(SwapBacked, swapbacked)
> >
> >  __PAGEFLAG(SlobFree, slob_free)
> >
> > +PAGEFLAG(Uninitialized2Mib, uninitialized2mib)
> > +
> >  /*
> >   * Private page markings that may be used by the filesystem that owns the 
> > page
> >   * for its own purposes.
> > @@ -499,7 +502,7 @@ static inline void ClearPageSlabPfmemalloc(struct page 
> > *page)
> >  #define PAGE_FLAGS_CHECK_AT_FREE \
> > (1 << PG_lru | 1 << PG_locked| \
> >  1 << PG_private | 1 << PG_private_2 | \
> > -1 << PG_writeback | 1 << PG_reserved | \
> > +1 << PG_writeback | 1 << PG_reserved | 1 << PG_uninitialized2mib | 
> > \
> >  1 << PG_slab| 1 << PG_swapcache | 1 << PG_active | \
> >  1 << PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \
> >  __PG_COMPOUND_LOCK)
> > diff --git a/mm/nobootmem.c b/mm/nobootmem.c
> > index 3b512ca..e3a386d 100644
> > --- a/mm/nobootmem.c
> > +++ b/mm/nobootmem.c
> > @@ -126,6 +126,9 @@ static unsigned long __init 
> > free_low_memory_core_early(void)
> > phys_addr_t start, end, size;
> > u64 i;
> >
> > +   for_each_reserved_mem_region(i, , )
> > +   __reserve_bootmem_region(start, end);
> > +
> 
> How about holes that is not in memblock.reserved?
> 
> before this patch:
> free_area_init_node/free_area_init_core/memmap_init_zone
> will mark all page in node range to Reserved in struct page, at first.
> 
> but those holes is not mapped via kernel low mapping.
> so it should be ok not touch "struct page" for them.
> 
> Now you only mark reserved for memblock.reserved at first, and later
> mark {memblock.memory} - { memblock.reserved} to be available.
> And that is ok.
> 
> but should split that change to another patch and add some comment
> and change log for the change.
> in case there is some user like UEFI etc do weird thing.

I will split out a separate patch for this.

> > for_each_free_mem_range(i, MAX_NUMNODES, , , NULL)
> > count += __free_memory_core(start, end);
> >
> > @@ -162,6 +165,8 @@ unsigned long __init free_all_bootmem(void)
> >  {
> > struct pglist_data *pgdat;
> >
> > +   memblock_dump_all();
> > +
> 
> Not needed.

Left over debug in the rush to ask our question.

> >

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-16 Thread Robin Holt

On Fri, Jul 12, 2013 at 09:19:12PM -0700, Yinghai Lu wrote:
 On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt h...@sgi.com wrote:
  During boot of large memory machines, a significant portion of boot
  is spent initializing the struct page array.  The vast majority of
  those pages are not referenced during boot.
 
  Change this over to only initializing the pages when they are
  actually allocated.
 
  Besides the advantage of boot speed, this allows us the chance to
  use normal performance monitoring tools to determine where the bulk
  of time is spent during page initialization.
 
  Signed-off-by: Robin Holt h...@sgi.com
  Signed-off-by: Nate Zimmer nzim...@sgi.com
  To: H. Peter Anvin h...@zytor.com
  To: Ingo Molnar mi...@kernel.org
  Cc: Linux Kernel linux-kernel@vger.kernel.org
  Cc: Linux MM linux...@kvack.org
  Cc: Rob Landley r...@landley.net
  Cc: Mike Travis tra...@sgi.com
  Cc: Daniel J Blueman dan...@numascale-asia.com
  Cc: Andrew Morton a...@linux-foundation.org
  Cc: Greg KH gre...@linuxfoundation.org
  Cc: Yinghai Lu ying...@kernel.org
  Cc: Mel Gorman mgor...@suse.de
  ---
   include/linux/mm.h |  11 +
   include/linux/page-flags.h |   5 +-
   mm/nobootmem.c |   5 ++
   mm/page_alloc.c| 117 
  +++--
   4 files changed, 132 insertions(+), 6 deletions(-)
 
  diff --git a/include/linux/mm.h b/include/linux/mm.h
  index e0c8528..3de08b5 100644
  --- a/include/linux/mm.h
  +++ b/include/linux/mm.h
  @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page 
  *page)
  __free_page(page);
   }
 
  +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
  +
  +static inline void __reserve_bootmem_page(struct page *page)
  +{
  +   phys_addr_t start = page_to_pfn(page)  PAGE_SHIFT;
  +   phys_addr_t end = start + PAGE_SIZE;
  +
  +   __reserve_bootmem_region(start, end);
  +}
  +
   static inline void free_reserved_page(struct page *page)
   {
  +   __reserve_bootmem_page(page);
  __free_reserved_page(page);
  adjust_managed_page_count(page, 1);
   }
  diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
  index 6d53675..79e8eb7 100644
  --- a/include/linux/page-flags.h
  +++ b/include/linux/page-flags.h
  @@ -83,6 +83,7 @@ enum pageflags {
  PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/
  PG_arch_1,
  PG_reserved,
  +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh 
  */
  PG_private, /* If pagecache, has fs-private data */
  PG_private_2,   /* If pagecache, has fs aux data */
  PG_writeback,   /* Page is under writeback */
  @@ -211,6 +212,8 @@ PAGEFLAG(SwapBacked, swapbacked) 
  __CLEARPAGEFLAG(SwapBacked, swapbacked)
 
   __PAGEFLAG(SlobFree, slob_free)
 
  +PAGEFLAG(Uninitialized2Mib, uninitialized2mib)
  +
   /*
* Private page markings that may be used by the filesystem that owns the 
  page
* for its own purposes.
  @@ -499,7 +502,7 @@ static inline void ClearPageSlabPfmemalloc(struct page 
  *page)
   #define PAGE_FLAGS_CHECK_AT_FREE \
  (1  PG_lru | 1  PG_locked| \
   1  PG_private | 1  PG_private_2 | \
  -1  PG_writeback | 1  PG_reserved | \
  +1  PG_writeback | 1  PG_reserved | 1  PG_uninitialized2mib | 
  \
   1  PG_slab| 1  PG_swapcache | 1  PG_active | \
   1  PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \
   __PG_COMPOUND_LOCK)
  diff --git a/mm/nobootmem.c b/mm/nobootmem.c
  index 3b512ca..e3a386d 100644
  --- a/mm/nobootmem.c
  +++ b/mm/nobootmem.c
  @@ -126,6 +126,9 @@ static unsigned long __init 
  free_low_memory_core_early(void)
  phys_addr_t start, end, size;
  u64 i;
 
  +   for_each_reserved_mem_region(i, start, end)
  +   __reserve_bootmem_region(start, end);
  +
 
 How about holes that is not in memblock.reserved?
 
 before this patch:
 free_area_init_node/free_area_init_core/memmap_init_zone
 will mark all page in node range to Reserved in struct page, at first.
 
 but those holes is not mapped via kernel low mapping.
 so it should be ok not touch struct page for them.
 
 Now you only mark reserved for memblock.reserved at first, and later
 mark {memblock.memory} - { memblock.reserved} to be available.
 And that is ok.
 
 but should split that change to another patch and add some comment
 and change log for the change.
 in case there is some user like UEFI etc do weird thing.

I will split out a separate patch for this.

  for_each_free_mem_range(i, MAX_NUMNODES, start, end, NULL)
  count += __free_memory_core(start, end);
 
  @@ -162,6 +165,8 @@ unsigned long __init free_all_bootmem(void)
   {
  struct pglist_data *pgdat;
 
  +   memblock_dump_all();
  +
 
 Not needed.

Left over debug in the rush to ask our question.

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-16 Thread Robin Holt

On Mon, Jul 15, 2013 at 02:30:37PM -0700, Andrew Morton wrote:
 On Thu, 11 Jul 2013 21:03:55 -0500 Robin Holt h...@sgi.com wrote:
 
  During boot of large memory machines, a significant portion of boot
  is spent initializing the struct page array.  The vast majority of
  those pages are not referenced during boot.
  
  Change this over to only initializing the pages when they are
  actually allocated.
  
  Besides the advantage of boot speed, this allows us the chance to
  use normal performance monitoring tools to determine where the bulk
  of time is spent during page initialization.
  
  ...
 
  --- a/include/linux/mm.h
  +++ b/include/linux/mm.h
  @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page 
  *page)
  __free_page(page);
   }
   
  +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
  +
  +static inline void __reserve_bootmem_page(struct page *page)
  +{
  +   phys_addr_t start = page_to_pfn(page)  PAGE_SHIFT;
  +   phys_addr_t end = start + PAGE_SIZE;
  +
  +   __reserve_bootmem_region(start, end);
  +}
 
 It isn't obvious that this needed to be inlined?

It is being declared in a header file.  All the other functions I came
across in that header file are declared as inline (or __always_inline).
It feels to me like this is right.  Can I leave it as-is?

 
   static inline void free_reserved_page(struct page *page)
   {
  +   __reserve_bootmem_page(page);
  __free_reserved_page(page);
  adjust_managed_page_count(page, 1);
   }
  diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
  index 6d53675..79e8eb7 100644
  --- a/include/linux/page-flags.h
  +++ b/include/linux/page-flags.h
  @@ -83,6 +83,7 @@ enum pageflags {
  PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/
  PG_arch_1,
  PG_reserved,
  +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh */
 
 mib creeps me out too.  And it makes me think of SNMP, which I'd
 prefer not to think about.
 
 We've traditionally had fears of running out of page flags, but I've
 lost track of how close we are to that happening.  IIRC the answer
 depends on whether you believe there is such a thing as a 32-bit NUMA
 system.
 
 Can this be avoided anyway?  I suspect there's some idiotic combination
 of flags we could use to indicate the state.  PG_reserved|PG_lru or
 something.
 
 2MB sounds terribly arch-specific.  Shouldn't we make it more generic
 for when the hexagon64 port wants to use 4MB?
 
 That conversational code comment was already commented on, but it's
 still there?

I am going to work on making it non-2m based over the course of this
week, so expect the _2m (current name based on Yinghai's comments)
to go away entirely.

  
  ...
 
  --- a/mm/page_alloc.c
  +++ b/mm/page_alloc.c
  @@ -740,6 +740,54 @@ static void __init_single_page(struct page *page, 
  unsigned long zone, int nid, i
   #endif
   }
   
  +static void expand_page_initialization(struct page *basepage)
  +{
  +   unsigned long pfn = page_to_pfn(basepage);
  +   unsigned long end_pfn = pfn + PTRS_PER_PMD;
  +   unsigned long zone = page_zonenum(basepage);
  +   int reserved = PageReserved(basepage);
  +   int nid = page_to_nid(basepage);
  +
  +   ClearPageUninitialized2Mib(basepage);
  +
  +   for( pfn++; pfn  end_pfn; pfn++ )
  +   __init_single_page(pfn_to_page(pfn), zone, nid, reserved);
  +}
  +
  +void ensure_pages_are_initialized(unsigned long start_pfn,
  + unsigned long end_pfn)
 
 I think this can be made static.  I hope so, as it's a somewhat
 odd-sounding identifier for a global.

Done.

  +{
  +   unsigned long aligned_start_pfn = start_pfn  ~(PTRS_PER_PMD - 1);
  +   unsigned long aligned_end_pfn;
  +   struct page *page;
  +
  +   aligned_end_pfn = end_pfn  ~(PTRS_PER_PMD - 1);
  +   aligned_end_pfn += PTRS_PER_PMD;
  +   while (aligned_start_pfn  aligned_end_pfn) {
  +   if (pfn_valid(aligned_start_pfn)) {
  +   page = pfn_to_page(aligned_start_pfn);
  +
  +   if(PageUninitialized2Mib(page))
 
 checkpatch them, please.

Will certainly do.

  +   expand_page_initialization(page);
  +   }
  +
  +   aligned_start_pfn += PTRS_PER_PMD;
  +   }
  +}
 
 Some nice code comments for the above two functions would be helpful.

Will do.

  
  ...
 
  +int __meminit pfn_range_init_avail(unsigned long pfn, unsigned long 
  end_pfn,
  +  unsigned long size, int nid)
  +{
  +   unsigned long validate_end_pfn = pfn + size;
  +
  +   if (pfn  (size - 1))
  +   return 1;
  +
  +   if (pfn + size = end_pfn)
  +   return 1;
  +
  +   while (pfn  validate_end_pfn)
  +   {
  +   if (!early_pfn_valid(pfn))
  +   return 1;
  +   if (!early_pfn_in_nid(pfn, nid))
  +   return 1;
  +   pfn++;
  +   }
  +
  +   return size;
  +}
 
 Document it, please.  The

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-15 Thread Andrew Morton

On Thu, 11 Jul 2013 21:03:55 -0500 Robin Holt  wrote:

> During boot of large memory machines, a significant portion of boot
> is spent initializing the struct page array.  The vast majority of
> those pages are not referenced during boot.
> 
> Change this over to only initializing the pages when they are
> actually allocated.
> 
> Besides the advantage of boot speed, this allows us the chance to
> use normal performance monitoring tools to determine where the bulk
> of time is spent during page initialization.
> 
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page 
> *page)
>   __free_page(page);
>  }
>  
> +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
> +
> +static inline void __reserve_bootmem_page(struct page *page)
> +{
> + phys_addr_t start = page_to_pfn(page) << PAGE_SHIFT;
> + phys_addr_t end = start + PAGE_SIZE;
> +
> + __reserve_bootmem_region(start, end);
> +}

It isn't obvious that this needed to be inlined?

>  static inline void free_reserved_page(struct page *page)
>  {
> + __reserve_bootmem_page(page);
>   __free_reserved_page(page);
>   adjust_managed_page_count(page, 1);
>  }
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 6d53675..79e8eb7 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -83,6 +83,7 @@ enum pageflags {
>   PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/
>   PG_arch_1,
>   PG_reserved,
> + PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh */

"mib" creeps me out too.  And it makes me think of SNMP, which I'd
prefer not to think about.

We've traditionally had fears of running out of page flags, but I've
lost track of how close we are to that happening.  IIRC the answer
depends on whether you believe there is such a thing as a 32-bit NUMA
system.

Can this be avoided anyway?  I suspect there's some idiotic combination
of flags we could use to indicate the state.  PG_reserved|PG_lru or
something.

"2MB" sounds terribly arch-specific.  Shouldn't we make it more generic
for when the hexagon64 port wants to use 4MB?

That conversational code comment was already commented on, but it's
still there?

> 
> ...
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -740,6 +740,54 @@ static void __init_single_page(struct page *page, 
> unsigned long zone, int nid, i
>  #endif
>  }
>  
> +static void expand_page_initialization(struct page *basepage)
> +{
> + unsigned long pfn = page_to_pfn(basepage);
> + unsigned long end_pfn = pfn + PTRS_PER_PMD;
> + unsigned long zone = page_zonenum(basepage);
> + int reserved = PageReserved(basepage);
> + int nid = page_to_nid(basepage);
> +
> + ClearPageUninitialized2Mib(basepage);
> +
> + for( pfn++; pfn < end_pfn; pfn++ )
> + __init_single_page(pfn_to_page(pfn), zone, nid, reserved);
> +}
> +
> +void ensure_pages_are_initialized(unsigned long start_pfn,
> +   unsigned long end_pfn)

I think this can be made static.  I hope so, as it's a somewhat
odd-sounding identifier for a global.

> +{
> + unsigned long aligned_start_pfn = start_pfn & ~(PTRS_PER_PMD - 1);
> + unsigned long aligned_end_pfn;
> + struct page *page;
> +
> + aligned_end_pfn = end_pfn & ~(PTRS_PER_PMD - 1);
> + aligned_end_pfn += PTRS_PER_PMD;
> + while (aligned_start_pfn < aligned_end_pfn) {
> + if (pfn_valid(aligned_start_pfn)) {
> + page = pfn_to_page(aligned_start_pfn);
> +
> + if(PageUninitialized2Mib(page))

checkpatch them, please.

> + expand_page_initialization(page);
> + }
> +
> + aligned_start_pfn += PTRS_PER_PMD;
> + }
> +}

Some nice code comments for the above two functions would be helpful.

> 
> ...
>
> +int __meminit pfn_range_init_avail(unsigned long pfn, unsigned long end_pfn,
> +unsigned long size, int nid)
> +{
> + unsigned long validate_end_pfn = pfn + size;
> +
> + if (pfn & (size - 1))
> + return 1;
> +
> + if (pfn + size >= end_pfn)
> + return 1;
> +
> + while (pfn < validate_end_pfn)
> + {
> + if (!early_pfn_valid(pfn))
> + return 1;
> + if (!early_pfn_in_nid(pfn, nid))
> + return 1;
> + pfn++;
> + }
> +
> + return size;
> +}

Document it, please.  The return value semantics look odd, so don't
forget to explain all that as well.

> 
> ...
>
> @@ -6196,6 +6302,7 @@ static const struct trace_print_flags pageflag_names[] 
> = {
>   {1UL << PG_owner_priv_1,"owner_priv_1"  },
>   {1UL << PG_arch_1,  "arch_1"},
>   {1UL << PG_reserved,"reserved"  },
> + {1UL << PG_uninitialized2mib,

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-15 Thread H. Peter Anvin

On 07/15/2013 11:26 AM, Robin Holt wrote:
> Is there a fairly cheap way to determine definitively that the struct
> page is not initialized?

By definition I would assume no.  The only way I can think of would be
to unmap the memory associated with the struct page in the TLB and
initialize the struct pages at trap time.

> I think this patch set can change fairly drastically if we have that.
> I think I will start working up those changes and code a heavy-handed
> check until I hear of an alternative way to cheaply check.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-15 Thread Robin Holt

On Mon, Jul 15, 2013 at 10:54:38AM -0700, H. Peter Anvin wrote:
> On 07/15/2013 10:45 AM, Nathan Zimmer wrote:
> > 
> > I hadn't actually been very happy with having a PG_uninitialized2mib flag.
> > It implies if we want to jump to 1Gb pages we would need a second flag,
> > PG_uninitialized1gb, for that.  I was thinking of changing it to
> > PG_uninitialized and setting page->private to the correct order.
> > Thoughts?
> > 
> 
> Seems straightforward.  The bigger issue is the amount of overhead we
> cause by having to check upstack for the initialization status of the
> superpages.
> 
> I'm concerned, obviously, about lingering overhead that is "forever".
> That being said, in the absolutely worst case we could have a counter to
> the number of uninitialized pages which when it hits zero we do a static
> switch and switch out the initialization code (would have to be undone
> on memory hotplug, of course.)

Is there a fairly cheap way to determine definitively that the struct
page is not initialized?

I think this patch set can change fairly drastically if we have that.
I think I will start working up those changes and code a heavy-handed
check until I hear of an alternative way to cheaply check.

Thanks,
Robin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-15 Thread H. Peter Anvin

On 07/15/2013 10:45 AM, Nathan Zimmer wrote:
> 
> I hadn't actually been very happy with having a PG_uninitialized2mib flag.
> It implies if we want to jump to 1Gb pages we would need a second flag,
> PG_uninitialized1gb, for that.  I was thinking of changing it to
> PG_uninitialized and setting page->private to the correct order.
> Thoughts?
> 

Seems straightforward.  The bigger issue is the amount of overhead we
cause by having to check upstack for the initialization status of the
superpages.

I'm concerned, obviously, about lingering overhead that is "forever".
That being said, in the absolutely worst case we could have a counter to
the number of uninitialized pages which when it hits zero we do a static
switch and switch out the initialization code (would have to be undone
on memory hotplug, of course.)

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-15 Thread Nathan Zimmer

On Fri, Jul 12, 2013 at 09:19:12PM -0700, Yinghai Lu wrote:
> On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt  wrote:
> > +
> > page = pfn_to_page(pfn);
> > __init_single_page(page, zone, nid, 1);
> > +
> > +   if (pfns > 1)
> > +   SetPageUninitialized2Mib(page);
> > +
> > +   pfn += pfns;
> > }
> >  }
> >
> > @@ -6196,6 +6302,7 @@ static const struct trace_print_flags 
> > pageflag_names[] = {
> > {1UL << PG_owner_priv_1,"owner_priv_1"  },
> > {1UL << PG_arch_1,  "arch_1"},
> > {1UL << PG_reserved,"reserved"  },
> > +   {1UL << PG_uninitialized2mib,   "Uninit_2MiB"   },
> 
> PG_uninitialized_2m ?
> 
> > {1UL << PG_private, "private"   },
> > {1UL << PG_private_2,   "private_2" },
> > {1UL << PG_writeback,   "writeback" },
> 
> Yinghai


I hadn't actually been very happy with having a PG_uninitialized2mib flag.
It implies if we want to jump to 1Gb pages we would need a second flag,
PG_uninitialized1gb, for that.  I was thinking of changing it to
PG_uninitialized and setting page->private to the correct order.
Thoughts?

Nate

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-15 Thread Nathan Zimmer


On 07/13/2013 12:31 AM, Yinghai Lu wrote:

On Fri, Jul 12, 2013 at 9:39 PM, H. Peter Anvin  wrote:

On 07/12/2013 09:19 PM, Yinghai Lu wrote:

 PG_reserved,
+   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh */
 PG_private, /* If pagecache, has fs-private data */

The comment here is WTF...

ntz: Nate Zimmer?
rmh: Robin Holt?


Yea that comment was supposed to be removed.
Sorry about that.

Nate
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-15 Thread Nathan Zimmer


On 07/13/2013 12:31 AM, Yinghai Lu wrote:

On Fri, Jul 12, 2013 at 9:39 PM, H. Peter Anvin h...@zytor.com wrote:

On 07/12/2013 09:19 PM, Yinghai Lu wrote:

 PG_reserved,
+   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh */
 PG_private, /* If pagecache, has fs-private data */

The comment here is WTF...

ntz: Nate Zimmer?
rmh: Robin Holt?


Yea that comment was supposed to be removed.
Sorry about that.

Nate
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-15 Thread Nathan Zimmer

On Fri, Jul 12, 2013 at 09:19:12PM -0700, Yinghai Lu wrote:
 On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt h...@sgi.com wrote:
  +
  page = pfn_to_page(pfn);
  __init_single_page(page, zone, nid, 1);
  +
  +   if (pfns  1)
  +   SetPageUninitialized2Mib(page);
  +
  +   pfn += pfns;
  }
   }
 
  @@ -6196,6 +6302,7 @@ static const struct trace_print_flags 
  pageflag_names[] = {
  {1UL  PG_owner_priv_1,owner_priv_1  },
  {1UL  PG_arch_1,  arch_1},
  {1UL  PG_reserved,reserved  },
  +   {1UL  PG_uninitialized2mib,   Uninit_2MiB   },
 
 PG_uninitialized_2m ?
 
  {1UL  PG_private, private   },
  {1UL  PG_private_2,   private_2 },
  {1UL  PG_writeback,   writeback },
 
 Yinghai


I hadn't actually been very happy with having a PG_uninitialized2mib flag.
It implies if we want to jump to 1Gb pages we would need a second flag,
PG_uninitialized1gb, for that.  I was thinking of changing it to
PG_uninitialized and setting page-private to the correct order.
Thoughts?

Nate

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-15 Thread H. Peter Anvin

On 07/15/2013 10:45 AM, Nathan Zimmer wrote:
 
 I hadn't actually been very happy with having a PG_uninitialized2mib flag.
 It implies if we want to jump to 1Gb pages we would need a second flag,
 PG_uninitialized1gb, for that.  I was thinking of changing it to
 PG_uninitialized and setting page-private to the correct order.
 Thoughts?
 

Seems straightforward.  The bigger issue is the amount of overhead we
cause by having to check upstack for the initialization status of the
superpages.

I'm concerned, obviously, about lingering overhead that is forever.
That being said, in the absolutely worst case we could have a counter to
the number of uninitialized pages which when it hits zero we do a static
switch and switch out the initialization code (would have to be undone
on memory hotplug, of course.)

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-15 Thread Robin Holt

On Mon, Jul 15, 2013 at 10:54:38AM -0700, H. Peter Anvin wrote:
 On 07/15/2013 10:45 AM, Nathan Zimmer wrote:
  
  I hadn't actually been very happy with having a PG_uninitialized2mib flag.
  It implies if we want to jump to 1Gb pages we would need a second flag,
  PG_uninitialized1gb, for that.  I was thinking of changing it to
  PG_uninitialized and setting page-private to the correct order.
  Thoughts?
  
 
 Seems straightforward.  The bigger issue is the amount of overhead we
 cause by having to check upstack for the initialization status of the
 superpages.
 
 I'm concerned, obviously, about lingering overhead that is forever.
 That being said, in the absolutely worst case we could have a counter to
 the number of uninitialized pages which when it hits zero we do a static
 switch and switch out the initialization code (would have to be undone
 on memory hotplug, of course.)

Is there a fairly cheap way to determine definitively that the struct
page is not initialized?

I think this patch set can change fairly drastically if we have that.
I think I will start working up those changes and code a heavy-handed
check until I hear of an alternative way to cheaply check.

Thanks,
Robin
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-15 Thread H. Peter Anvin

On 07/15/2013 11:26 AM, Robin Holt wrote:
 Is there a fairly cheap way to determine definitively that the struct
 page is not initialized?

By definition I would assume no.  The only way I can think of would be
to unmap the memory associated with the struct page in the TLB and
initialize the struct pages at trap time.

 I think this patch set can change fairly drastically if we have that.
 I think I will start working up those changes and code a heavy-handed
 check until I hear of an alternative way to cheaply check.

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-15 Thread Andrew Morton

On Thu, 11 Jul 2013 21:03:55 -0500 Robin Holt h...@sgi.com wrote:

 During boot of large memory machines, a significant portion of boot
 is spent initializing the struct page array.  The vast majority of
 those pages are not referenced during boot.
 
 Change this over to only initializing the pages when they are
 actually allocated.
 
 Besides the advantage of boot speed, this allows us the chance to
 use normal performance monitoring tools to determine where the bulk
 of time is spent during page initialization.
 
 ...

 --- a/include/linux/mm.h
 +++ b/include/linux/mm.h
 @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page 
 *page)
   __free_page(page);
  }
  
 +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
 +
 +static inline void __reserve_bootmem_page(struct page *page)
 +{
 + phys_addr_t start = page_to_pfn(page)  PAGE_SHIFT;
 + phys_addr_t end = start + PAGE_SIZE;
 +
 + __reserve_bootmem_region(start, end);
 +}

It isn't obvious that this needed to be inlined?

  static inline void free_reserved_page(struct page *page)
  {
 + __reserve_bootmem_page(page);
   __free_reserved_page(page);
   adjust_managed_page_count(page, 1);
  }
 diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
 index 6d53675..79e8eb7 100644
 --- a/include/linux/page-flags.h
 +++ b/include/linux/page-flags.h
 @@ -83,6 +83,7 @@ enum pageflags {
   PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/
   PG_arch_1,
   PG_reserved,
 + PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh */

mib creeps me out too.  And it makes me think of SNMP, which I'd
prefer not to think about.

We've traditionally had fears of running out of page flags, but I've
lost track of how close we are to that happening.  IIRC the answer
depends on whether you believe there is such a thing as a 32-bit NUMA
system.

Can this be avoided anyway?  I suspect there's some idiotic combination
of flags we could use to indicate the state.  PG_reserved|PG_lru or
something.

2MB sounds terribly arch-specific.  Shouldn't we make it more generic
for when the hexagon64 port wants to use 4MB?

That conversational code comment was already commented on, but it's
still there?

 
 ...

 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -740,6 +740,54 @@ static void __init_single_page(struct page *page, 
 unsigned long zone, int nid, i
  #endif
  }
  
 +static void expand_page_initialization(struct page *basepage)
 +{
 + unsigned long pfn = page_to_pfn(basepage);
 + unsigned long end_pfn = pfn + PTRS_PER_PMD;
 + unsigned long zone = page_zonenum(basepage);
 + int reserved = PageReserved(basepage);
 + int nid = page_to_nid(basepage);
 +
 + ClearPageUninitialized2Mib(basepage);
 +
 + for( pfn++; pfn  end_pfn; pfn++ )
 + __init_single_page(pfn_to_page(pfn), zone, nid, reserved);
 +}
 +
 +void ensure_pages_are_initialized(unsigned long start_pfn,
 +   unsigned long end_pfn)

I think this can be made static.  I hope so, as it's a somewhat
odd-sounding identifier for a global.

 +{
 + unsigned long aligned_start_pfn = start_pfn  ~(PTRS_PER_PMD - 1);
 + unsigned long aligned_end_pfn;
 + struct page *page;
 +
 + aligned_end_pfn = end_pfn  ~(PTRS_PER_PMD - 1);
 + aligned_end_pfn += PTRS_PER_PMD;
 + while (aligned_start_pfn  aligned_end_pfn) {
 + if (pfn_valid(aligned_start_pfn)) {
 + page = pfn_to_page(aligned_start_pfn);
 +
 + if(PageUninitialized2Mib(page))

checkpatch them, please.

 + expand_page_initialization(page);
 + }
 +
 + aligned_start_pfn += PTRS_PER_PMD;
 + }
 +}

Some nice code comments for the above two functions would be helpful.

 
 ...

 +int __meminit pfn_range_init_avail(unsigned long pfn, unsigned long end_pfn,
 +unsigned long size, int nid)
 +{
 + unsigned long validate_end_pfn = pfn + size;
 +
 + if (pfn  (size - 1))
 + return 1;
 +
 + if (pfn + size = end_pfn)
 + return 1;
 +
 + while (pfn  validate_end_pfn)
 + {
 + if (!early_pfn_valid(pfn))
 + return 1;
 + if (!early_pfn_in_nid(pfn, nid))
 + return 1;
 + pfn++;
 + }
 +
 + return size;
 +}

Document it, please.  The return value semantics look odd, so don't
forget to explain all that as well.

 
 ...

 @@ -6196,6 +6302,7 @@ static const struct trace_print_flags pageflag_names[] 
 = {
   {1UL  PG_owner_priv_1,owner_priv_1  },
   {1UL  PG_arch_1,  arch_1},
   {1UL  PG_reserved,reserved  },
 + {1UL  PG_uninitialized2mib,   Uninit_2MiB   },

It would be better if the name which is visible in procfs matches the
name in the kernel source code.

   {1UL  PG_private,

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-12 Thread H. Peter Anvin

On 07/12/2013 10:31 PM, Yinghai Lu wrote:
> On Fri, Jul 12, 2013 at 9:39 PM, H. Peter Anvin  wrote:
>> On 07/12/2013 09:19 PM, Yinghai Lu wrote:
 PG_reserved,
 +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh 
 */
 PG_private, /* If pagecache, has fs-private data */
>>
>> The comment here is WTF...
> 
> ntz: Nate Zimmer?
> rmh: Robin Holt?
> 

This kind of conversation doesn't really belong in a code comment, though.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-12 Thread Yinghai Lu

On Fri, Jul 12, 2013 at 9:39 PM, H. Peter Anvin  wrote:
> On 07/12/2013 09:19 PM, Yinghai Lu wrote:
>>> PG_reserved,
>>> +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh 
>>> */
>>> PG_private, /* If pagecache, has fs-private data */
>
> The comment here is WTF...

ntz: Nate Zimmer?
rmh: Robin Holt?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-12 Thread H. Peter Anvin

On 07/12/2013 09:19 PM, Yinghai Lu wrote:
>> PG_reserved,
>> +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh */
>> PG_private, /* If pagecache, has fs-private data */

The comment here is WTF...

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-12 Thread Yinghai Lu

On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt  wrote:
> During boot of large memory machines, a significant portion of boot
> is spent initializing the struct page array.  The vast majority of
> those pages are not referenced during boot.
>
> Change this over to only initializing the pages when they are
> actually allocated.
>
> Besides the advantage of boot speed, this allows us the chance to
> use normal performance monitoring tools to determine where the bulk
> of time is spent during page initialization.
>
> Signed-off-by: Robin Holt 
> Signed-off-by: Nate Zimmer 
> To: "H. Peter Anvin" 
> To: Ingo Molnar 
> Cc: Linux Kernel 
> Cc: Linux MM 
> Cc: Rob Landley 
> Cc: Mike Travis 
> Cc: Daniel J Blueman 
> Cc: Andrew Morton 
> Cc: Greg KH 
> Cc: Yinghai Lu 
> Cc: Mel Gorman 
> ---
>  include/linux/mm.h |  11 +
>  include/linux/page-flags.h |   5 +-
>  mm/nobootmem.c |   5 ++
>  mm/page_alloc.c| 117 
> +++--
>  4 files changed, 132 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e0c8528..3de08b5 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page 
> *page)
> __free_page(page);
>  }
>
> +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
> +
> +static inline void __reserve_bootmem_page(struct page *page)
> +{
> +   phys_addr_t start = page_to_pfn(page) << PAGE_SHIFT;
> +   phys_addr_t end = start + PAGE_SIZE;
> +
> +   __reserve_bootmem_region(start, end);
> +}
> +
>  static inline void free_reserved_page(struct page *page)
>  {
> +   __reserve_bootmem_page(page);
> __free_reserved_page(page);
> adjust_managed_page_count(page, 1);
>  }
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 6d53675..79e8eb7 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -83,6 +83,7 @@ enum pageflags {
> PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/
> PG_arch_1,
> PG_reserved,
> +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh */
> PG_private, /* If pagecache, has fs-private data */
> PG_private_2,   /* If pagecache, has fs aux data */
> PG_writeback,   /* Page is under writeback */
> @@ -211,6 +212,8 @@ PAGEFLAG(SwapBacked, swapbacked) 
> __CLEARPAGEFLAG(SwapBacked, swapbacked)
>
>  __PAGEFLAG(SlobFree, slob_free)
>
> +PAGEFLAG(Uninitialized2Mib, uninitialized2mib)
> +
>  /*
>   * Private page markings that may be used by the filesystem that owns the 
> page
>   * for its own purposes.
> @@ -499,7 +502,7 @@ static inline void ClearPageSlabPfmemalloc(struct page 
> *page)
>  #define PAGE_FLAGS_CHECK_AT_FREE \
> (1 << PG_lru | 1 << PG_locked| \
>  1 << PG_private | 1 << PG_private_2 | \
> -1 << PG_writeback | 1 << PG_reserved | \
> +1 << PG_writeback | 1 << PG_reserved | 1 << PG_uninitialized2mib | \
>  1 << PG_slab| 1 << PG_swapcache | 1 << PG_active | \
>  1 << PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \
>  __PG_COMPOUND_LOCK)
> diff --git a/mm/nobootmem.c b/mm/nobootmem.c
> index 3b512ca..e3a386d 100644
> --- a/mm/nobootmem.c
> +++ b/mm/nobootmem.c
> @@ -126,6 +126,9 @@ static unsigned long __init 
> free_low_memory_core_early(void)
> phys_addr_t start, end, size;
> u64 i;
>
> +   for_each_reserved_mem_region(i, , )
> +   __reserve_bootmem_region(start, end);
> +

How about holes that is not in memblock.reserved?

before this patch:
free_area_init_node/free_area_init_core/memmap_init_zone
will mark all page in node range to Reserved in struct page, at first.

but those holes is not mapped via kernel low mapping.
so it should be ok not touch "struct page" for them.

Now you only mark reserved for memblock.reserved at first, and later
mark {memblock.memory} - { memblock.reserved} to be available.
And that is ok.

but should split that change to another patch and add some comment
and change log for the change.
in case there is some user like UEFI etc do weird thing.

> for_each_free_mem_range(i, MAX_NUMNODES, , , NULL)
> count += __free_memory_core(start, end);
>
> @@ -162,6 +165,8 @@ unsigned long __init free_all_bootmem(void)
>  {
> struct pglist_data *pgdat;
>
> +   memblock_dump_all();
> +

Not needed.

> for_each_online_pgdat(pgdat)
> reset_node_lowmem_managed_pages(pgdat);
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 635b131..fe51eb9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -740,6 +740,54 @@ static void __init_single_page(struct page *page, 
> unsigned long zone, int nid, i
>  #endif
>  }
>
> +static void expand_page_initialization(struct page *basepage)
> +{
> +   unsigned long

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-12 Thread Yinghai Lu

On Thu, Jul 11, 2013 at 7:03 PM, Robin Holt h...@sgi.com wrote:
 During boot of large memory machines, a significant portion of boot
 is spent initializing the struct page array.  The vast majority of
 those pages are not referenced during boot.

 Change this over to only initializing the pages when they are
 actually allocated.

 Besides the advantage of boot speed, this allows us the chance to
 use normal performance monitoring tools to determine where the bulk
 of time is spent during page initialization.

 Signed-off-by: Robin Holt h...@sgi.com
 Signed-off-by: Nate Zimmer nzim...@sgi.com
 To: H. Peter Anvin h...@zytor.com
 To: Ingo Molnar mi...@kernel.org
 Cc: Linux Kernel linux-kernel@vger.kernel.org
 Cc: Linux MM linux...@kvack.org
 Cc: Rob Landley r...@landley.net
 Cc: Mike Travis tra...@sgi.com
 Cc: Daniel J Blueman dan...@numascale-asia.com
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Greg KH gre...@linuxfoundation.org
 Cc: Yinghai Lu ying...@kernel.org
 Cc: Mel Gorman mgor...@suse.de
 ---
  include/linux/mm.h |  11 +
  include/linux/page-flags.h |   5 +-
  mm/nobootmem.c |   5 ++
  mm/page_alloc.c| 117 
 +++--
  4 files changed, 132 insertions(+), 6 deletions(-)

 diff --git a/include/linux/mm.h b/include/linux/mm.h
 index e0c8528..3de08b5 100644
 --- a/include/linux/mm.h
 +++ b/include/linux/mm.h
 @@ -1330,8 +1330,19 @@ static inline void __free_reserved_page(struct page 
 *page)
 __free_page(page);
  }

 +extern void __reserve_bootmem_region(phys_addr_t start, phys_addr_t end);
 +
 +static inline void __reserve_bootmem_page(struct page *page)
 +{
 +   phys_addr_t start = page_to_pfn(page)  PAGE_SHIFT;
 +   phys_addr_t end = start + PAGE_SIZE;
 +
 +   __reserve_bootmem_region(start, end);
 +}
 +
  static inline void free_reserved_page(struct page *page)
  {
 +   __reserve_bootmem_page(page);
 __free_reserved_page(page);
 adjust_managed_page_count(page, 1);
  }
 diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
 index 6d53675..79e8eb7 100644
 --- a/include/linux/page-flags.h
 +++ b/include/linux/page-flags.h
 @@ -83,6 +83,7 @@ enum pageflags {
 PG_owner_priv_1,/* Owner use. If pagecache, fs may use*/
 PG_arch_1,
 PG_reserved,
 +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh */
 PG_private, /* If pagecache, has fs-private data */
 PG_private_2,   /* If pagecache, has fs aux data */
 PG_writeback,   /* Page is under writeback */
 @@ -211,6 +212,8 @@ PAGEFLAG(SwapBacked, swapbacked) 
 __CLEARPAGEFLAG(SwapBacked, swapbacked)

  __PAGEFLAG(SlobFree, slob_free)

 +PAGEFLAG(Uninitialized2Mib, uninitialized2mib)
 +
  /*
   * Private page markings that may be used by the filesystem that owns the 
 page
   * for its own purposes.
 @@ -499,7 +502,7 @@ static inline void ClearPageSlabPfmemalloc(struct page 
 *page)
  #define PAGE_FLAGS_CHECK_AT_FREE \
 (1  PG_lru | 1  PG_locked| \
  1  PG_private | 1  PG_private_2 | \
 -1  PG_writeback | 1  PG_reserved | \
 +1  PG_writeback | 1  PG_reserved | 1  PG_uninitialized2mib | \
  1  PG_slab| 1  PG_swapcache | 1  PG_active | \
  1  PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \
  __PG_COMPOUND_LOCK)
 diff --git a/mm/nobootmem.c b/mm/nobootmem.c
 index 3b512ca..e3a386d 100644
 --- a/mm/nobootmem.c
 +++ b/mm/nobootmem.c
 @@ -126,6 +126,9 @@ static unsigned long __init 
 free_low_memory_core_early(void)
 phys_addr_t start, end, size;
 u64 i;

 +   for_each_reserved_mem_region(i, start, end)
 +   __reserve_bootmem_region(start, end);
 +

How about holes that is not in memblock.reserved?

before this patch:
free_area_init_node/free_area_init_core/memmap_init_zone
will mark all page in node range to Reserved in struct page, at first.

but those holes is not mapped via kernel low mapping.
so it should be ok not touch struct page for them.

Now you only mark reserved for memblock.reserved at first, and later
mark {memblock.memory} - { memblock.reserved} to be available.
And that is ok.

but should split that change to another patch and add some comment
and change log for the change.
in case there is some user like UEFI etc do weird thing.

 for_each_free_mem_range(i, MAX_NUMNODES, start, end, NULL)
 count += __free_memory_core(start, end);

 @@ -162,6 +165,8 @@ unsigned long __init free_all_bootmem(void)
  {
 struct pglist_data *pgdat;

 +   memblock_dump_all();
 +

Not needed.

 for_each_online_pgdat(pgdat)
 reset_node_lowmem_managed_pages(pgdat);

 diff --git a/mm/page_alloc.c b/mm/page_alloc.c
 index 635b131..fe51eb9 100644
 --- a/mm/page_alloc.c
 +++ b/mm/page_alloc.c
 @@ -740,6 +740,54 @@ static void __init_single_page(struct page *page, 
 unsigned long zone, int

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-12 Thread H. Peter Anvin

On 07/12/2013 09:19 PM, Yinghai Lu wrote:
 PG_reserved,
 +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh */
 PG_private, /* If pagecache, has fs-private data */

The comment here is WTF...

-hpa


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-12 Thread Yinghai Lu

On Fri, Jul 12, 2013 at 9:39 PM, H. Peter Anvin h...@zytor.com wrote:
 On 07/12/2013 09:19 PM, Yinghai Lu wrote:
 PG_reserved,
 +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh 
 */
 PG_private, /* If pagecache, has fs-private data */

 The comment here is WTF...

ntz: Nate Zimmer?
rmh: Robin Holt?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 4/4] Sparse initialization of struct page array.

2013-07-12 Thread H. Peter Anvin

On 07/12/2013 10:31 PM, Yinghai Lu wrote:
 On Fri, Jul 12, 2013 at 9:39 PM, H. Peter Anvin h...@zytor.com wrote:
 On 07/12/2013 09:19 PM, Yinghai Lu wrote:
 PG_reserved,
 +   PG_uninitialized2mib,   /* Is this the right spot? ntz - Yes - rmh 
 */
 PG_private, /* If pagecache, has fs-private data */

 The comment here is WTF...
 
 ntz: Nate Zimmer?
 rmh: Robin Holt?
 

This kind of conversation doesn't really belong in a code comment, though.

-hpa

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

42 matches

Mail list logo