Re: [mm PATCH v3 3/6] mm: Use memblock/zone specific iterator for handling deferred page init

2018-10-17 Thread Mike Rapoport
On Mon, Oct 15, 2018 at 01:27:09PM -0700, Alexander Duyck wrote:
> This patch introduces a new iterator for_each_free_mem_pfn_range_in_zone.
> 
> This iterator will take care of making sure a given memory range provided
> is in fact contained within a zone. It takes are of all the bounds checking
> we were doing in deferred_grow_zone, and deferred_init_memmap. In addition
> it should help to speed up the search a bit by iterating until the end of a
> range is greater than the start of the zone pfn range, and will exit
> completely if the start is beyond the end of the zone.
> 
> This patch adds yet another iterator called
> for_each_free_mem_range_in_zone_from and then uses it to support
> initializing and freeing pages in groups no larger than MAX_ORDER_NR_PAGES.
> By doing this we can greatly improve the cache locality of the pages while
> we do several loops over them in the init and freeing process.
> 
> We are able to tighten the loops as a result since we only really need the
> checks for first_init_pfn in our first iteration and after that we can
> assume that all future values will be greater than this. So I have added a
> function called deferred_init_mem_pfn_range_in_zone that primes the
> iterators and if it fails we can just exit.
> 
> Signed-off-by: Alexander Duyck 
> ---
>  include/linux/memblock.h |   58 +++
>  mm/memblock.c|   63 
>  mm/page_alloc.c  |  176 
> --
>  3 files changed, 242 insertions(+), 55 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index aee299a6aa76..d62b95dba94e 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -178,6 +178,25 @@ void __next_reserved_mem_region(u64 *idx, phys_addr_t 
> *out_start,
> p_start, p_end, p_nid))
> 
>  /**
> + * for_each_mem_range - iterate through memblock areas from type_a and not

nit: for_each_mem_range_from

> + * included in type_b. Or just type_a if type_b is NULL.
> + * @i: u64 used as loop variable
> + * @type_a: ptr to memblock_type to iterate
> + * @type_b: ptr to memblock_type which excludes from the iteration
> + * @nid: node selector, %NUMA_NO_NODE for all nodes
> + * @flags: pick from blocks based on memory attributes
> + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
> + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
> + * @p_nid: ptr to int for nid of the range, can be %NULL
> + */
> +#define for_each_mem_range_from(i, type_a, type_b, nid, flags,   
> \
> +p_start, p_end, p_nid)   \
> + for (i = 0, __next_mem_range(, nid, flags, type_a, type_b,\
> +  p_start, p_end, p_nid);\
> +  i != (u64)ULLONG_MAX;  \
> +  __next_mem_range(, nid, flags, type_a, type_b,   \
> +   p_start, p_end, p_nid))
> +/**
>   * for_each_mem_range_rev - reverse iterate through memblock areas from
>   * type_a and not included in type_b. Or just type_a if type_b is NULL.
>   * @i: u64 used as loop variable
> @@ -248,6 +267,45 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned 
> long *out_start_pfn,
>i >= 0; __next_mem_pfn_range(, nid, p_start, p_end, p_nid))
>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> 
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> +void __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone,
> +   unsigned long *out_spfn,
> +   unsigned long *out_epfn);
> +/**
> + * for_each_free_mem_range_in_zone - iterate through zone specific free
> + * memblock areas
> + * @i: u64 used as loop variable
> + * @zone: zone in which all of the memory blocks reside
> + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
> + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
> + *
> + * Walks over free (memory && !reserved) areas of memblock in a specific
> + * zone. Available as soon as memblock is initialized.
> + */
> +#define for_each_free_mem_pfn_range_in_zone(i, zone, p_start, p_end) \
> + for (i = 0, \
> +  __next_mem_pfn_range_in_zone(, zone, p_start, p_end);\
> +  i != (u64)ULLONG_MAX;  \
> +  __next_mem_pfn_range_in_zone(, zone, p_start, p_end))
> +
> +/**
> + * for_each_free_mem_range_in_zone_from - iterate through zone specific
> + * free memblock areas from a given point
> + * @i: u64 used as loop variable
> + * @zone: zone in which all of the memory blocks reside
> + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
> + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
> + *
> + * Walks over free (memory && !reserved) areas of 

Re: [mm PATCH v3 3/6] mm: Use memblock/zone specific iterator for handling deferred page init

2018-10-17 Thread Mike Rapoport
On Mon, Oct 15, 2018 at 01:27:09PM -0700, Alexander Duyck wrote:
> This patch introduces a new iterator for_each_free_mem_pfn_range_in_zone.
> 
> This iterator will take care of making sure a given memory range provided
> is in fact contained within a zone. It takes are of all the bounds checking
> we were doing in deferred_grow_zone, and deferred_init_memmap. In addition
> it should help to speed up the search a bit by iterating until the end of a
> range is greater than the start of the zone pfn range, and will exit
> completely if the start is beyond the end of the zone.
> 
> This patch adds yet another iterator called
> for_each_free_mem_range_in_zone_from and then uses it to support
> initializing and freeing pages in groups no larger than MAX_ORDER_NR_PAGES.
> By doing this we can greatly improve the cache locality of the pages while
> we do several loops over them in the init and freeing process.
> 
> We are able to tighten the loops as a result since we only really need the
> checks for first_init_pfn in our first iteration and after that we can
> assume that all future values will be greater than this. So I have added a
> function called deferred_init_mem_pfn_range_in_zone that primes the
> iterators and if it fails we can just exit.
> 
> Signed-off-by: Alexander Duyck 
> ---
>  include/linux/memblock.h |   58 +++
>  mm/memblock.c|   63 
>  mm/page_alloc.c  |  176 
> --
>  3 files changed, 242 insertions(+), 55 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index aee299a6aa76..d62b95dba94e 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -178,6 +178,25 @@ void __next_reserved_mem_region(u64 *idx, phys_addr_t 
> *out_start,
> p_start, p_end, p_nid))
> 
>  /**
> + * for_each_mem_range - iterate through memblock areas from type_a and not

nit: for_each_mem_range_from

> + * included in type_b. Or just type_a if type_b is NULL.
> + * @i: u64 used as loop variable
> + * @type_a: ptr to memblock_type to iterate
> + * @type_b: ptr to memblock_type which excludes from the iteration
> + * @nid: node selector, %NUMA_NO_NODE for all nodes
> + * @flags: pick from blocks based on memory attributes
> + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
> + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
> + * @p_nid: ptr to int for nid of the range, can be %NULL
> + */
> +#define for_each_mem_range_from(i, type_a, type_b, nid, flags,   
> \
> +p_start, p_end, p_nid)   \
> + for (i = 0, __next_mem_range(, nid, flags, type_a, type_b,\
> +  p_start, p_end, p_nid);\
> +  i != (u64)ULLONG_MAX;  \
> +  __next_mem_range(, nid, flags, type_a, type_b,   \
> +   p_start, p_end, p_nid))
> +/**
>   * for_each_mem_range_rev - reverse iterate through memblock areas from
>   * type_a and not included in type_b. Or just type_a if type_b is NULL.
>   * @i: u64 used as loop variable
> @@ -248,6 +267,45 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned 
> long *out_start_pfn,
>i >= 0; __next_mem_pfn_range(, nid, p_start, p_end, p_nid))
>  #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
> 
> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> +void __next_mem_pfn_range_in_zone(u64 *idx, struct zone *zone,
> +   unsigned long *out_spfn,
> +   unsigned long *out_epfn);
> +/**
> + * for_each_free_mem_range_in_zone - iterate through zone specific free
> + * memblock areas
> + * @i: u64 used as loop variable
> + * @zone: zone in which all of the memory blocks reside
> + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
> + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
> + *
> + * Walks over free (memory && !reserved) areas of memblock in a specific
> + * zone. Available as soon as memblock is initialized.
> + */
> +#define for_each_free_mem_pfn_range_in_zone(i, zone, p_start, p_end) \
> + for (i = 0, \
> +  __next_mem_pfn_range_in_zone(, zone, p_start, p_end);\
> +  i != (u64)ULLONG_MAX;  \
> +  __next_mem_pfn_range_in_zone(, zone, p_start, p_end))
> +
> +/**
> + * for_each_free_mem_range_in_zone_from - iterate through zone specific
> + * free memblock areas from a given point
> + * @i: u64 used as loop variable
> + * @zone: zone in which all of the memory blocks reside
> + * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
> + * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
> + *
> + * Walks over free (memory && !reserved) areas of 

Re: [mm PATCH v3 3/6] mm: Use memblock/zone specific iterator for handling deferred page init

2018-10-17 Thread Alexander Duyck

On 10/17/2018 2:11 AM, Michal Hocko wrote:

On Mon 15-10-18 13:27:09, Alexander Duyck wrote:

This patch introduces a new iterator for_each_free_mem_pfn_range_in_zone.

This iterator will take care of making sure a given memory range provided
is in fact contained within a zone. It takes are of all the bounds checking
we were doing in deferred_grow_zone, and deferred_init_memmap. In addition
it should help to speed up the search a bit by iterating until the end of a
range is greater than the start of the zone pfn range, and will exit
completely if the start is beyond the end of the zone.

This patch adds yet another iterator called
for_each_free_mem_range_in_zone_from and then uses it to support
initializing and freeing pages in groups no larger than MAX_ORDER_NR_PAGES.
By doing this we can greatly improve the cache locality of the pages while
we do several loops over them in the init and freeing process.

We are able to tighten the loops as a result since we only really need the
checks for first_init_pfn in our first iteration and after that we can
assume that all future values will be greater than this. So I have added a
function called deferred_init_mem_pfn_range_in_zone that primes the
iterators and if it fails we can just exit.


Numbers please.

Besides that, this adds a lot of code and I am not convinced the result
is so much better to justify that.
If I recall most of the gains are due to better cache locality. Instead 
of running through all of memory once for init, and once for freeing 
this patch has us doing it in MAX_ORDER_NR_PAGES sized chunks. So the 
advantage is that we can keep most of the pages structs in the L2 cache 
at least on x86 processors to avoid having to go to memory as much.


I'll run performance numbers per patch today and try to make certain I 
have a line mentioning the delta for each patch in the v4 patch set.


- Alex


Re: [mm PATCH v3 3/6] mm: Use memblock/zone specific iterator for handling deferred page init

2018-10-17 Thread Alexander Duyck

On 10/17/2018 2:11 AM, Michal Hocko wrote:

On Mon 15-10-18 13:27:09, Alexander Duyck wrote:

This patch introduces a new iterator for_each_free_mem_pfn_range_in_zone.

This iterator will take care of making sure a given memory range provided
is in fact contained within a zone. It takes are of all the bounds checking
we were doing in deferred_grow_zone, and deferred_init_memmap. In addition
it should help to speed up the search a bit by iterating until the end of a
range is greater than the start of the zone pfn range, and will exit
completely if the start is beyond the end of the zone.

This patch adds yet another iterator called
for_each_free_mem_range_in_zone_from and then uses it to support
initializing and freeing pages in groups no larger than MAX_ORDER_NR_PAGES.
By doing this we can greatly improve the cache locality of the pages while
we do several loops over them in the init and freeing process.

We are able to tighten the loops as a result since we only really need the
checks for first_init_pfn in our first iteration and after that we can
assume that all future values will be greater than this. So I have added a
function called deferred_init_mem_pfn_range_in_zone that primes the
iterators and if it fails we can just exit.


Numbers please.

Besides that, this adds a lot of code and I am not convinced the result
is so much better to justify that.
If I recall most of the gains are due to better cache locality. Instead 
of running through all of memory once for init, and once for freeing 
this patch has us doing it in MAX_ORDER_NR_PAGES sized chunks. So the 
advantage is that we can keep most of the pages structs in the L2 cache 
at least on x86 processors to avoid having to go to memory as much.


I'll run performance numbers per patch today and try to make certain I 
have a line mentioning the delta for each patch in the v4 patch set.


- Alex


Re: [mm PATCH v3 3/6] mm: Use memblock/zone specific iterator for handling deferred page init

2018-10-17 Thread Michal Hocko
On Mon 15-10-18 13:27:09, Alexander Duyck wrote:
> This patch introduces a new iterator for_each_free_mem_pfn_range_in_zone.
> 
> This iterator will take care of making sure a given memory range provided
> is in fact contained within a zone. It takes are of all the bounds checking
> we were doing in deferred_grow_zone, and deferred_init_memmap. In addition
> it should help to speed up the search a bit by iterating until the end of a
> range is greater than the start of the zone pfn range, and will exit
> completely if the start is beyond the end of the zone.
> 
> This patch adds yet another iterator called
> for_each_free_mem_range_in_zone_from and then uses it to support
> initializing and freeing pages in groups no larger than MAX_ORDER_NR_PAGES.
> By doing this we can greatly improve the cache locality of the pages while
> we do several loops over them in the init and freeing process.
> 
> We are able to tighten the loops as a result since we only really need the
> checks for first_init_pfn in our first iteration and after that we can
> assume that all future values will be greater than this. So I have added a
> function called deferred_init_mem_pfn_range_in_zone that primes the
> iterators and if it fails we can just exit.

Numbers please.

Besides that, this adds a lot of code and I am not convinced the result
is so much better to justify that. 

> Signed-off-by: Alexander Duyck 
> ---
>  include/linux/memblock.h |   58 +++
>  mm/memblock.c|   63 
>  mm/page_alloc.c  |  176 
> --
>  3 files changed, 242 insertions(+), 55 deletions(-)
-- 
Michal Hocko
SUSE Labs


Re: [mm PATCH v3 3/6] mm: Use memblock/zone specific iterator for handling deferred page init

2018-10-17 Thread Michal Hocko
On Mon 15-10-18 13:27:09, Alexander Duyck wrote:
> This patch introduces a new iterator for_each_free_mem_pfn_range_in_zone.
> 
> This iterator will take care of making sure a given memory range provided
> is in fact contained within a zone. It takes are of all the bounds checking
> we were doing in deferred_grow_zone, and deferred_init_memmap. In addition
> it should help to speed up the search a bit by iterating until the end of a
> range is greater than the start of the zone pfn range, and will exit
> completely if the start is beyond the end of the zone.
> 
> This patch adds yet another iterator called
> for_each_free_mem_range_in_zone_from and then uses it to support
> initializing and freeing pages in groups no larger than MAX_ORDER_NR_PAGES.
> By doing this we can greatly improve the cache locality of the pages while
> we do several loops over them in the init and freeing process.
> 
> We are able to tighten the loops as a result since we only really need the
> checks for first_init_pfn in our first iteration and after that we can
> assume that all future values will be greater than this. So I have added a
> function called deferred_init_mem_pfn_range_in_zone that primes the
> iterators and if it fails we can just exit.

Numbers please.

Besides that, this adds a lot of code and I am not convinced the result
is so much better to justify that. 

> Signed-off-by: Alexander Duyck 
> ---
>  include/linux/memblock.h |   58 +++
>  mm/memblock.c|   63 
>  mm/page_alloc.c  |  176 
> --
>  3 files changed, 242 insertions(+), 55 deletions(-)
-- 
Michal Hocko
SUSE Labs