Re: [PATCH v3 1/1] mm: initialize pages on demand during boot

2018-02-14 Thread Pavel Tatashin
Hi Sergey,

Thank you for noticing this! I will send out an updated patch soon.

Pavel

On Wed, Feb 14, 2018 at 12:08 AM, Sergey Senozhatsky
 wrote:
> On (02/09/18 14:22), Pavel Tatashin wrote:
> [..]
>> +/*
>> + * If this zone has deferred pages, try to grow it by initializing enough
>> + * deferred pages to satisfy the allocation specified by order, rounded up 
>> to
>> + * the nearest PAGES_PER_SECTION boundary.  So we're adding memory in 
>> increments
>> + * of SECTION_SIZE bytes by initializing struct pages in increments of
>> + * PAGES_PER_SECTION * sizeof(struct page) bytes.
>> + *
>> + * Return true when zone was grown by at least number of pages specified by
>> + * order. Otherwise return false.
>> + *
>> + * Note: We use noinline because this function is needed only during boot, 
>> and
>> + * it is called from a __ref function _deferred_grow_zone. This way we are
>> + * making sure that it is not inlined into permanent text section.
>> + */
>> +static noinline bool __init
>> +deferred_grow_zone(struct zone *zone, unsigned int order)
>> +{
>> + int zid = zone_idx(zone);
>> + int nid = zone->node;
>
> ^
>
> Should be CONFIG_NUMA dependent
>
> struct zone {
> ...
> #ifdef CONFIG_NUMA
> int node;
> #endif
> ...
>
> -ss
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [PATCH v3 1/1] mm: initialize pages on demand during boot

2018-02-14 Thread Pavel Tatashin
Hi Sergey,

Thank you for noticing this! I will send out an updated patch soon.

Pavel

On Wed, Feb 14, 2018 at 12:08 AM, Sergey Senozhatsky
 wrote:
> On (02/09/18 14:22), Pavel Tatashin wrote:
> [..]
>> +/*
>> + * If this zone has deferred pages, try to grow it by initializing enough
>> + * deferred pages to satisfy the allocation specified by order, rounded up 
>> to
>> + * the nearest PAGES_PER_SECTION boundary.  So we're adding memory in 
>> increments
>> + * of SECTION_SIZE bytes by initializing struct pages in increments of
>> + * PAGES_PER_SECTION * sizeof(struct page) bytes.
>> + *
>> + * Return true when zone was grown by at least number of pages specified by
>> + * order. Otherwise return false.
>> + *
>> + * Note: We use noinline because this function is needed only during boot, 
>> and
>> + * it is called from a __ref function _deferred_grow_zone. This way we are
>> + * making sure that it is not inlined into permanent text section.
>> + */
>> +static noinline bool __init
>> +deferred_grow_zone(struct zone *zone, unsigned int order)
>> +{
>> + int zid = zone_idx(zone);
>> + int nid = zone->node;
>
> ^
>
> Should be CONFIG_NUMA dependent
>
> struct zone {
> ...
> #ifdef CONFIG_NUMA
> int node;
> #endif
> ...
>
> -ss
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 


Re: [PATCH v3 1/1] mm: initialize pages on demand during boot

2018-02-13 Thread Sergey Senozhatsky
On (02/09/18 14:22), Pavel Tatashin wrote:
[..]
> +/*
> + * If this zone has deferred pages, try to grow it by initializing enough
> + * deferred pages to satisfy the allocation specified by order, rounded up to
> + * the nearest PAGES_PER_SECTION boundary.  So we're adding memory in 
> increments
> + * of SECTION_SIZE bytes by initializing struct pages in increments of
> + * PAGES_PER_SECTION * sizeof(struct page) bytes.
> + *
> + * Return true when zone was grown by at least number of pages specified by
> + * order. Otherwise return false.
> + *
> + * Note: We use noinline because this function is needed only during boot, 
> and
> + * it is called from a __ref function _deferred_grow_zone. This way we are
> + * making sure that it is not inlined into permanent text section.
> + */
> +static noinline bool __init
> +deferred_grow_zone(struct zone *zone, unsigned int order)
> +{
> + int zid = zone_idx(zone);
> + int nid = zone->node;

^

Should be CONFIG_NUMA dependent

struct zone {
...
#ifdef CONFIG_NUMA
int node;
#endif
...

-ss


Re: [PATCH v3 1/1] mm: initialize pages on demand during boot

2018-02-13 Thread Sergey Senozhatsky
On (02/09/18 14:22), Pavel Tatashin wrote:
[..]
> +/*
> + * If this zone has deferred pages, try to grow it by initializing enough
> + * deferred pages to satisfy the allocation specified by order, rounded up to
> + * the nearest PAGES_PER_SECTION boundary.  So we're adding memory in 
> increments
> + * of SECTION_SIZE bytes by initializing struct pages in increments of
> + * PAGES_PER_SECTION * sizeof(struct page) bytes.
> + *
> + * Return true when zone was grown by at least number of pages specified by
> + * order. Otherwise return false.
> + *
> + * Note: We use noinline because this function is needed only during boot, 
> and
> + * it is called from a __ref function _deferred_grow_zone. This way we are
> + * making sure that it is not inlined into permanent text section.
> + */
> +static noinline bool __init
> +deferred_grow_zone(struct zone *zone, unsigned int order)
> +{
> + int zid = zone_idx(zone);
> + int nid = zone->node;

^

Should be CONFIG_NUMA dependent

struct zone {
...
#ifdef CONFIG_NUMA
int node;
#endif
...

-ss


[PATCH v3 1/1] mm: initialize pages on demand during boot

2018-02-09 Thread Pavel Tatashin
Deferred page initialization allows the boot cpu to initialize a small
subset of the system's pages early in boot, with other cpus doing the rest
later on.

It is, however, problematic to know how many pages the kernel needs during
boot.  Different modules and kernel parameters may change the requirement,
so the boot cpu either initializes too many pages or runs out of memory.

To fix that, initialize early pages on demand.  This ensures the kernel
does the minimum amount of work to initialize pages during boot and leaves
the rest to be divided in the multithreaded initialization path
(deferred_init_memmap).

The on-demand code is permanently disabled using static branching once
deferred pages are initialized.  After the static branch is changed to
false, the overhead is up-to two branch-always instructions if the zone
watermark check fails or if rmqueue fails.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Daniel Jordan 
Reviewed-by: Steven Sistare 
Tested-by: Masayoshi Mizuma 
---
 include/linux/memblock.h |  10 ---
 mm/memblock.c|  23 ---
 mm/page_alloc.c  | 175 ---
 3 files changed, 136 insertions(+), 72 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 8be5077efb5f..6c305afd95ab 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -417,21 +417,11 @@ static inline void early_memtest(phys_addr_t start, 
phys_addr_t end)
 {
 }
 #endif
-
-extern unsigned long memblock_reserved_memory_within(phys_addr_t start_addr,
-   phys_addr_t end_addr);
 #else
 static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align)
 {
return 0;
 }
-
-static inline unsigned long memblock_reserved_memory_within(phys_addr_t 
start_addr,
-   phys_addr_t end_addr)
-{
-   return 0;
-}
-
 #endif /* CONFIG_HAVE_MEMBLOCK */
 
 #endif /* __KERNEL__ */
diff --git a/mm/memblock.c b/mm/memblock.c
index 5a9ca2a1751b..4120e9f536f7 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1778,29 +1778,6 @@ static void __init_memblock memblock_dump(struct 
memblock_type *type)
}
 }
 
-extern unsigned long __init_memblock
-memblock_reserved_memory_within(phys_addr_t start_addr, phys_addr_t end_addr)
-{
-   struct memblock_region *rgn;
-   unsigned long size = 0;
-   int idx;
-
-   for_each_memblock_type(idx, (), rgn) {
-   phys_addr_t start, end;
-
-   if (rgn->base + rgn->size < start_addr)
-   continue;
-   if (rgn->base > end_addr)
-   continue;
-
-   start = rgn->base;
-   end = start + rgn->size;
-   size += end - start;
-   }
-
-   return size;
-}
-
 void __init_memblock __memblock_dump_all(void)
 {
pr_info("MEMBLOCK configuration:\n");
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 81e18ceef579..5938523eb309 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -291,40 +291,6 @@ EXPORT_SYMBOL(nr_online_nodes);
 int page_group_by_mobility_disabled __read_mostly;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-
-/*
- * Determine how many pages need to be initialized during early boot
- * (non-deferred initialization).
- * The value of first_deferred_pfn will be set later, once non-deferred pages
- * are initialized, but for now set it ULONG_MAX.
- */
-static inline void reset_deferred_meminit(pg_data_t *pgdat)
-{
-   phys_addr_t start_addr, end_addr;
-   unsigned long max_pgcnt;
-   unsigned long reserved;
-
-   /*
-* Initialise at least 2G of a node but also take into account that
-* two large system hashes that can take up 1GB for 0.25TB/node.
-*/
-   max_pgcnt = max(2UL << (30 - PAGE_SHIFT),
-   (pgdat->node_spanned_pages >> 8));
-
-   /*
-* Compensate the all the memblock reservations (e.g. crash kernel)
-* from the initial estimation to make sure we will initialize enough
-* memory to boot.
-*/
-   start_addr = PFN_PHYS(pgdat->node_start_pfn);
-   end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt);
-   reserved = memblock_reserved_memory_within(start_addr, end_addr);
-   max_pgcnt += PHYS_PFN(reserved);
-
-   pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages);
-   pgdat->first_deferred_pfn = ULONG_MAX;
-}
-
 /* Returns true if the struct page for the pfn is uninitialised */
 static inline bool __meminit early_page_uninitialised(unsigned long pfn)
 {
@@ -357,10 +323,6 @@ static inline bool update_defer_init(pg_data_t *pgdat,
return true;
 }
 #else
-static inline void reset_deferred_meminit(pg_data_t *pgdat)
-{
-}
-
 static inline bool early_page_uninitialised(unsigned long pfn)
 {
return false;
@@ -1604,6 +1566,107 @@ static int __init 

[PATCH v3 1/1] mm: initialize pages on demand during boot

2018-02-09 Thread Pavel Tatashin
Deferred page initialization allows the boot cpu to initialize a small
subset of the system's pages early in boot, with other cpus doing the rest
later on.

It is, however, problematic to know how many pages the kernel needs during
boot.  Different modules and kernel parameters may change the requirement,
so the boot cpu either initializes too many pages or runs out of memory.

To fix that, initialize early pages on demand.  This ensures the kernel
does the minimum amount of work to initialize pages during boot and leaves
the rest to be divided in the multithreaded initialization path
(deferred_init_memmap).

The on-demand code is permanently disabled using static branching once
deferred pages are initialized.  After the static branch is changed to
false, the overhead is up-to two branch-always instructions if the zone
watermark check fails or if rmqueue fails.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Daniel Jordan 
Reviewed-by: Steven Sistare 
Tested-by: Masayoshi Mizuma 
---
 include/linux/memblock.h |  10 ---
 mm/memblock.c|  23 ---
 mm/page_alloc.c  | 175 ---
 3 files changed, 136 insertions(+), 72 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 8be5077efb5f..6c305afd95ab 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -417,21 +417,11 @@ static inline void early_memtest(phys_addr_t start, 
phys_addr_t end)
 {
 }
 #endif
-
-extern unsigned long memblock_reserved_memory_within(phys_addr_t start_addr,
-   phys_addr_t end_addr);
 #else
 static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align)
 {
return 0;
 }
-
-static inline unsigned long memblock_reserved_memory_within(phys_addr_t 
start_addr,
-   phys_addr_t end_addr)
-{
-   return 0;
-}
-
 #endif /* CONFIG_HAVE_MEMBLOCK */
 
 #endif /* __KERNEL__ */
diff --git a/mm/memblock.c b/mm/memblock.c
index 5a9ca2a1751b..4120e9f536f7 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1778,29 +1778,6 @@ static void __init_memblock memblock_dump(struct 
memblock_type *type)
}
 }
 
-extern unsigned long __init_memblock
-memblock_reserved_memory_within(phys_addr_t start_addr, phys_addr_t end_addr)
-{
-   struct memblock_region *rgn;
-   unsigned long size = 0;
-   int idx;
-
-   for_each_memblock_type(idx, (), rgn) {
-   phys_addr_t start, end;
-
-   if (rgn->base + rgn->size < start_addr)
-   continue;
-   if (rgn->base > end_addr)
-   continue;
-
-   start = rgn->base;
-   end = start + rgn->size;
-   size += end - start;
-   }
-
-   return size;
-}
-
 void __init_memblock __memblock_dump_all(void)
 {
pr_info("MEMBLOCK configuration:\n");
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 81e18ceef579..5938523eb309 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -291,40 +291,6 @@ EXPORT_SYMBOL(nr_online_nodes);
 int page_group_by_mobility_disabled __read_mostly;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-
-/*
- * Determine how many pages need to be initialized during early boot
- * (non-deferred initialization).
- * The value of first_deferred_pfn will be set later, once non-deferred pages
- * are initialized, but for now set it ULONG_MAX.
- */
-static inline void reset_deferred_meminit(pg_data_t *pgdat)
-{
-   phys_addr_t start_addr, end_addr;
-   unsigned long max_pgcnt;
-   unsigned long reserved;
-
-   /*
-* Initialise at least 2G of a node but also take into account that
-* two large system hashes that can take up 1GB for 0.25TB/node.
-*/
-   max_pgcnt = max(2UL << (30 - PAGE_SHIFT),
-   (pgdat->node_spanned_pages >> 8));
-
-   /*
-* Compensate the all the memblock reservations (e.g. crash kernel)
-* from the initial estimation to make sure we will initialize enough
-* memory to boot.
-*/
-   start_addr = PFN_PHYS(pgdat->node_start_pfn);
-   end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt);
-   reserved = memblock_reserved_memory_within(start_addr, end_addr);
-   max_pgcnt += PHYS_PFN(reserved);
-
-   pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages);
-   pgdat->first_deferred_pfn = ULONG_MAX;
-}
-
 /* Returns true if the struct page for the pfn is uninitialised */
 static inline bool __meminit early_page_uninitialised(unsigned long pfn)
 {
@@ -357,10 +323,6 @@ static inline bool update_defer_init(pg_data_t *pgdat,
return true;
 }
 #else
-static inline void reset_deferred_meminit(pg_data_t *pgdat)
-{
-}
-
 static inline bool early_page_uninitialised(unsigned long pfn)
 {
return false;
@@ -1604,6 +1566,107 @@ static int __init deferred_init_memmap(void *data)
pgdat_init_report_one_done();
return 0;
 }
+
+/*
+ * This lock grantees that 

Re: [PATCH v3 1/1] mm: initialize pages on demand during boot

2018-02-09 Thread Andrew Morton
On Fri,  9 Feb 2018 14:22:16 -0500 Pavel Tatashin  
wrote:

> Deferred page initialization allows the boot cpu to initialize a small
> subset of the system's pages early in boot, with other cpus doing the rest
> later on.
> 
> It is, however, problematic to know how many pages the kernel needs during
> boot.  Different modules and kernel parameters may change the requirement,
> so the boot cpu either initializes too many pages or runs out of memory.
> 
> To fix that, initialize early pages on demand.  This ensures the kernel
> does the minimum amount of work to initialize pages during boot and leaves
> the rest to be divided in the multithreaded initialization path
> (deferred_init_memmap).
> 
> The on-demand code is permanently disabled using static branching once
> deferred pages are initialized.  After the static branch is changed to
> false, the overhead is up-to two branch-always instructions if the zone
> watermark check fails or if rmqueue fails.

lgtm, I'll toss it in for some testing.

A couple of tweaks:

From: Andrew Morton 
Subject: mm-initialize-pages-on-demand-during-boot-fix

fix typo in comment, make deferred_pages static

Cc: Pavel Tatashin 
Signed-off-by: Andrew Morton 
---

 mm/page_alloc.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN 
include/linux/memblock.h~mm-initialize-pages-on-demand-during-boot-fix 
include/linux/memblock.h
diff -puN mm/memblock.c~mm-initialize-pages-on-demand-during-boot-fix 
mm/memblock.c
diff -puN mm/page_alloc.c~mm-initialize-pages-on-demand-during-boot-fix 
mm/page_alloc.c
--- a/mm/page_alloc.c~mm-initialize-pages-on-demand-during-boot-fix
+++ a/mm/page_alloc.c
@@ -1568,14 +1568,14 @@ static int __init deferred_init_memmap(v
 }
 
 /*
- * This lock grantees that only one thread at a time is allowed to grow zones
+ * This lock guarantees that only one thread at a time is allowed to grow zones
  * (decrease number of deferred pages).
  * Protects first_deferred_pfn field in all zones during early boot before
  * deferred pages are initialized.  Deferred pages are initialized in
  * page_alloc_init_late() soon after smp_init() is complete.
  */
 static __initdata DEFINE_SPINLOCK(deferred_zone_grow_lock);
-DEFINE_STATIC_KEY_TRUE(deferred_pages);
+static DEFINE_STATIC_KEY_TRUE(deferred_pages);
 
 /*
  * If this zone has deferred pages, try to grow it by initializing enough
_



Re: [PATCH v3 1/1] mm: initialize pages on demand during boot

2018-02-09 Thread Andrew Morton
On Fri,  9 Feb 2018 14:22:16 -0500 Pavel Tatashin  
wrote:

> Deferred page initialization allows the boot cpu to initialize a small
> subset of the system's pages early in boot, with other cpus doing the rest
> later on.
> 
> It is, however, problematic to know how many pages the kernel needs during
> boot.  Different modules and kernel parameters may change the requirement,
> so the boot cpu either initializes too many pages or runs out of memory.
> 
> To fix that, initialize early pages on demand.  This ensures the kernel
> does the minimum amount of work to initialize pages during boot and leaves
> the rest to be divided in the multithreaded initialization path
> (deferred_init_memmap).
> 
> The on-demand code is permanently disabled using static branching once
> deferred pages are initialized.  After the static branch is changed to
> false, the overhead is up-to two branch-always instructions if the zone
> watermark check fails or if rmqueue fails.

lgtm, I'll toss it in for some testing.

A couple of tweaks:

From: Andrew Morton 
Subject: mm-initialize-pages-on-demand-during-boot-fix

fix typo in comment, make deferred_pages static

Cc: Pavel Tatashin 
Signed-off-by: Andrew Morton 
---

 mm/page_alloc.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff -puN 
include/linux/memblock.h~mm-initialize-pages-on-demand-during-boot-fix 
include/linux/memblock.h
diff -puN mm/memblock.c~mm-initialize-pages-on-demand-during-boot-fix 
mm/memblock.c
diff -puN mm/page_alloc.c~mm-initialize-pages-on-demand-during-boot-fix 
mm/page_alloc.c
--- a/mm/page_alloc.c~mm-initialize-pages-on-demand-during-boot-fix
+++ a/mm/page_alloc.c
@@ -1568,14 +1568,14 @@ static int __init deferred_init_memmap(v
 }
 
 /*
- * This lock grantees that only one thread at a time is allowed to grow zones
+ * This lock guarantees that only one thread at a time is allowed to grow zones
  * (decrease number of deferred pages).
  * Protects first_deferred_pfn field in all zones during early boot before
  * deferred pages are initialized.  Deferred pages are initialized in
  * page_alloc_init_late() soon after smp_init() is complete.
  */
 static __initdata DEFINE_SPINLOCK(deferred_zone_grow_lock);
-DEFINE_STATIC_KEY_TRUE(deferred_pages);
+static DEFINE_STATIC_KEY_TRUE(deferred_pages);
 
 /*
  * If this zone has deferred pages, try to grow it by initializing enough
_



[PATCH v3 1/1] mm: initialize pages on demand during boot

2018-02-09 Thread Pavel Tatashin
Deferred page initialization allows the boot cpu to initialize a small
subset of the system's pages early in boot, with other cpus doing the rest
later on.

It is, however, problematic to know how many pages the kernel needs during
boot.  Different modules and kernel parameters may change the requirement,
so the boot cpu either initializes too many pages or runs out of memory.

To fix that, initialize early pages on demand.  This ensures the kernel
does the minimum amount of work to initialize pages during boot and leaves
the rest to be divided in the multithreaded initialization path
(deferred_init_memmap).

The on-demand code is permanently disabled using static branching once
deferred pages are initialized.  After the static branch is changed to
false, the overhead is up-to two branch-always instructions if the zone
watermark check fails or if rmqueue fails.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Daniel Jordan 
Reviewed-by: Steven Sistare 
Tested-by: Masayoshi Mizuma 
---
 include/linux/memblock.h |  10 ---
 mm/memblock.c|  23 ---
 mm/page_alloc.c  | 175 ---
 3 files changed, 136 insertions(+), 72 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 8be5077efb5f..6c305afd95ab 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -417,21 +417,11 @@ static inline void early_memtest(phys_addr_t start, 
phys_addr_t end)
 {
 }
 #endif
-
-extern unsigned long memblock_reserved_memory_within(phys_addr_t start_addr,
-   phys_addr_t end_addr);
 #else
 static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align)
 {
return 0;
 }
-
-static inline unsigned long memblock_reserved_memory_within(phys_addr_t 
start_addr,
-   phys_addr_t end_addr)
-{
-   return 0;
-}
-
 #endif /* CONFIG_HAVE_MEMBLOCK */
 
 #endif /* __KERNEL__ */
diff --git a/mm/memblock.c b/mm/memblock.c
index 5a9ca2a1751b..4120e9f536f7 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1778,29 +1778,6 @@ static void __init_memblock memblock_dump(struct 
memblock_type *type)
}
 }
 
-extern unsigned long __init_memblock
-memblock_reserved_memory_within(phys_addr_t start_addr, phys_addr_t end_addr)
-{
-   struct memblock_region *rgn;
-   unsigned long size = 0;
-   int idx;
-
-   for_each_memblock_type(idx, (), rgn) {
-   phys_addr_t start, end;
-
-   if (rgn->base + rgn->size < start_addr)
-   continue;
-   if (rgn->base > end_addr)
-   continue;
-
-   start = rgn->base;
-   end = start + rgn->size;
-   size += end - start;
-   }
-
-   return size;
-}
-
 void __init_memblock __memblock_dump_all(void)
 {
pr_info("MEMBLOCK configuration:\n");
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 81e18ceef579..5938523eb309 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -291,40 +291,6 @@ EXPORT_SYMBOL(nr_online_nodes);
 int page_group_by_mobility_disabled __read_mostly;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-
-/*
- * Determine how many pages need to be initialized during early boot
- * (non-deferred initialization).
- * The value of first_deferred_pfn will be set later, once non-deferred pages
- * are initialized, but for now set it ULONG_MAX.
- */
-static inline void reset_deferred_meminit(pg_data_t *pgdat)
-{
-   phys_addr_t start_addr, end_addr;
-   unsigned long max_pgcnt;
-   unsigned long reserved;
-
-   /*
-* Initialise at least 2G of a node but also take into account that
-* two large system hashes that can take up 1GB for 0.25TB/node.
-*/
-   max_pgcnt = max(2UL << (30 - PAGE_SHIFT),
-   (pgdat->node_spanned_pages >> 8));
-
-   /*
-* Compensate the all the memblock reservations (e.g. crash kernel)
-* from the initial estimation to make sure we will initialize enough
-* memory to boot.
-*/
-   start_addr = PFN_PHYS(pgdat->node_start_pfn);
-   end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt);
-   reserved = memblock_reserved_memory_within(start_addr, end_addr);
-   max_pgcnt += PHYS_PFN(reserved);
-
-   pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages);
-   pgdat->first_deferred_pfn = ULONG_MAX;
-}
-
 /* Returns true if the struct page for the pfn is uninitialised */
 static inline bool __meminit early_page_uninitialised(unsigned long pfn)
 {
@@ -357,10 +323,6 @@ static inline bool update_defer_init(pg_data_t *pgdat,
return true;
 }
 #else
-static inline void reset_deferred_meminit(pg_data_t *pgdat)
-{
-}
-
 static inline bool early_page_uninitialised(unsigned long pfn)
 {
return false;
@@ -1604,6 +1566,107 @@ static int __init 

[PATCH v3 1/1] mm: initialize pages on demand during boot

2018-02-09 Thread Pavel Tatashin
Deferred page initialization allows the boot cpu to initialize a small
subset of the system's pages early in boot, with other cpus doing the rest
later on.

It is, however, problematic to know how many pages the kernel needs during
boot.  Different modules and kernel parameters may change the requirement,
so the boot cpu either initializes too many pages or runs out of memory.

To fix that, initialize early pages on demand.  This ensures the kernel
does the minimum amount of work to initialize pages during boot and leaves
the rest to be divided in the multithreaded initialization path
(deferred_init_memmap).

The on-demand code is permanently disabled using static branching once
deferred pages are initialized.  After the static branch is changed to
false, the overhead is up-to two branch-always instructions if the zone
watermark check fails or if rmqueue fails.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Daniel Jordan 
Reviewed-by: Steven Sistare 
Tested-by: Masayoshi Mizuma 
---
 include/linux/memblock.h |  10 ---
 mm/memblock.c|  23 ---
 mm/page_alloc.c  | 175 ---
 3 files changed, 136 insertions(+), 72 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 8be5077efb5f..6c305afd95ab 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -417,21 +417,11 @@ static inline void early_memtest(phys_addr_t start, 
phys_addr_t end)
 {
 }
 #endif
-
-extern unsigned long memblock_reserved_memory_within(phys_addr_t start_addr,
-   phys_addr_t end_addr);
 #else
 static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align)
 {
return 0;
 }
-
-static inline unsigned long memblock_reserved_memory_within(phys_addr_t 
start_addr,
-   phys_addr_t end_addr)
-{
-   return 0;
-}
-
 #endif /* CONFIG_HAVE_MEMBLOCK */
 
 #endif /* __KERNEL__ */
diff --git a/mm/memblock.c b/mm/memblock.c
index 5a9ca2a1751b..4120e9f536f7 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1778,29 +1778,6 @@ static void __init_memblock memblock_dump(struct 
memblock_type *type)
}
 }
 
-extern unsigned long __init_memblock
-memblock_reserved_memory_within(phys_addr_t start_addr, phys_addr_t end_addr)
-{
-   struct memblock_region *rgn;
-   unsigned long size = 0;
-   int idx;
-
-   for_each_memblock_type(idx, (), rgn) {
-   phys_addr_t start, end;
-
-   if (rgn->base + rgn->size < start_addr)
-   continue;
-   if (rgn->base > end_addr)
-   continue;
-
-   start = rgn->base;
-   end = start + rgn->size;
-   size += end - start;
-   }
-
-   return size;
-}
-
 void __init_memblock __memblock_dump_all(void)
 {
pr_info("MEMBLOCK configuration:\n");
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 81e18ceef579..5938523eb309 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -291,40 +291,6 @@ EXPORT_SYMBOL(nr_online_nodes);
 int page_group_by_mobility_disabled __read_mostly;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-
-/*
- * Determine how many pages need to be initialized during early boot
- * (non-deferred initialization).
- * The value of first_deferred_pfn will be set later, once non-deferred pages
- * are initialized, but for now set it ULONG_MAX.
- */
-static inline void reset_deferred_meminit(pg_data_t *pgdat)
-{
-   phys_addr_t start_addr, end_addr;
-   unsigned long max_pgcnt;
-   unsigned long reserved;
-
-   /*
-* Initialise at least 2G of a node but also take into account that
-* two large system hashes that can take up 1GB for 0.25TB/node.
-*/
-   max_pgcnt = max(2UL << (30 - PAGE_SHIFT),
-   (pgdat->node_spanned_pages >> 8));
-
-   /*
-* Compensate the all the memblock reservations (e.g. crash kernel)
-* from the initial estimation to make sure we will initialize enough
-* memory to boot.
-*/
-   start_addr = PFN_PHYS(pgdat->node_start_pfn);
-   end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt);
-   reserved = memblock_reserved_memory_within(start_addr, end_addr);
-   max_pgcnt += PHYS_PFN(reserved);
-
-   pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages);
-   pgdat->first_deferred_pfn = ULONG_MAX;
-}
-
 /* Returns true if the struct page for the pfn is uninitialised */
 static inline bool __meminit early_page_uninitialised(unsigned long pfn)
 {
@@ -357,10 +323,6 @@ static inline bool update_defer_init(pg_data_t *pgdat,
return true;
 }
 #else
-static inline void reset_deferred_meminit(pg_data_t *pgdat)
-{
-}
-
 static inline bool early_page_uninitialised(unsigned long pfn)
 {
return false;
@@ -1604,6 +1566,107 @@ static int __init deferred_init_memmap(void *data)
pgdat_init_report_one_done();
return 0;
 }
+
+/*
+ * This lock grantees that